metaspace2020 / Lithops-METASPACE

Lithops-based Serverless implementation of the METASPACE spatial metabolomics annotation pipeline
12 stars 4 forks source link

are notebooks updated to the recent databases? #42

Closed gilv closed 5 years ago

gilv commented 5 years ago

I notice that experiment-3-large (and also the rest ) of notebooks contains code

start_time = datetime.now()

# Build molecular database:
dump_mol_db(config, config['storage']['db_bucket'], 'metabolomics/db/mol_db1.pickle', 22) #HMDB-v4
dump_mol_db(config, config['storage']['db_bucket'], 'metabolomics/db/mol_db2.pickle', 19) #ChEBI-2018-01
dump_mol_db(config, config['storage']['db_bucket'], 'metabolomics/db/mol_db3.pickle', 24) #LipidMaps-2017-12-12
dump_mol_db(config, config['storage']['db_bucket'], 'metabolomics/db/mol_db4.pickle', 26) #SwissLipids-2018-02-02
build_database(config, input_db)
polarity = input_data['polarity']
isocalc_sigma = input_data['isocalc_sigma']
calculate_centroids(config, input_db, polarity, isocalc_sigma)

# Run Annotation Pipeline:
pipeline = Pipeline(config, input_config)
pipeline()
results_df = pipeline.get_results()
images_dict = pipeline.get_images()

finish_time = datetime.now()

However input_config_huge(2,3).json contains

"metabolomics/db/mol_db6.pickle"

Looking into db folder i see

 mol_db5.csv
 mol_db6.txt

Seems things not updated and not synchronized between notebooks and the json input files. In particular, don't we use mol_db5 anymore? We need to update the code so it will generated mol_db6.pickle and mol_db5.pickle ( if this database still needed )

gilv commented 5 years ago

@omerb01 @LachlanStuart can you handle it please? thanks

LachlanStuart commented 5 years ago

@gilv mol_db5.csv is a small database, made specifically for Experiment 2 to showcase a use-case that's not possible to do efficiently with the serverful implementation: being able to quickly run very small jobs. It is still used in Experiment 2, but it is not big enough to be interesting in other experiments. It's not on the METASPACE servers, so I can't use dump_mol_db, but I can distribute it publicly, so I've committed it to git.

mol_db6.csv is not currently able to be publicly distributed, so I can't commit it to git or make it possible to dump from the METASPACE servers.

gilv commented 5 years ago

@LachlanStuart then i am confused...the input json files has "mol_db6.pickle"...what code i should run to generate this pickle file?