greenelab / ponyo

Software to simulate compendium-wide gene expression data using a VAE.
BSD 3-Clause "New" or "Revised" License
3 stars 3 forks source link

shift_template_experiment() function in "simulate_expression_data.py" module #17

Closed dongbohu closed 3 years ago

dongbohu commented 4 years ago

In this function: https://github.com/greenelab/ponyo/blob/60f00701cf6cf54e92c88ee2100846ed575ed08f/ponyo/simulate_expression_data.py#L421-L431 the first argument is the filename of normalized data. When the file's size is large and the function is called in a loop, for example, in the following scenario: https://github.com/greenelab/generic-expression-patterns/blob/f29faf7362b9a01ba563fb35151c495a5043a2d1/human_analysis/nbconverted/2_identify_generic_genes_pathways.py#L142 each iteration of the loop will have to read the file's content into the memory, which could take a few minutes when the file is a compendium of all recount2 SRA data.

We can change the first argument into a pandas data frame to save the time.

ajlee21 commented 3 years ago

Addressed in https://github.com/greenelab/ponyo/pull/20