build_database_local is a drop-in replacement for build_database, except that it doesn't use Spark or PyWren.
With input_config_big it takes 90s on my 4-core machine, approximately the same time as build_database. With larger database configs (e.g. 10 databases, 4 adducts, 4 modifiers) it's a bit slower on processing (13 minutes with build_database_local vs 7m build_database).
However, the big benefit is that it only has half as much code. The data is kept in a single unsegmented dataframe, which makes it a lot easier to modify.
When comparing the results to the old output, I found that store_formula_to_id_chunk actually had a bug that caused it to not read the last formulas_chunk. This PR includes a fix to that bug.
build_database_local
is a drop-in replacement forbuild_database
, except that it doesn't use Spark or PyWren.With
input_config_big
it takes 90s on my 4-core machine, approximately the same time asbuild_database
. With larger database configs (e.g. 10 databases, 4 adducts, 4 modifiers) it's a bit slower on processing (13 minutes withbuild_database_local
vs 7mbuild_database
).However, the big benefit is that it only has half as much code. The data is kept in a single unsegmented dataframe, which makes it a lot easier to modify.
When comparing the results to the old output, I found that
store_formula_to_id_chunk
actually had a bug that caused it to not read the lastformulas_chunk
. This PR includes a fix to that bug.