skinniderlab / CLM

MIT License
0 stars 0 forks source link

only DataFrame input supported in write_to_csv_file #240

Closed vineetbansal closed 2 weeks ago

vineetbansal commented 1 month ago

Removed support for non-DataFrame inputs to write_to_csv_file. Only used at a single place in our code (add-carbon), which has been changed accordingly.

Also changed a part of the code in write_structural_prior that was relying on the delimiters on these carbon files being <tab>s (through r"\s"). All other places we read this file, we use read_csv_file which leaves it to pandas to determine the delimiter.

The changes in the test csv files are <tab>->, conversion (probably too large to diff on github).

skinnider commented 1 month ago

Looks reasonable to me - should I try switching branches and running this on one of the failing job before merging the PR, or should the test suite cover that?

vineetbansal commented 1 month ago

@skinnider - I'm 99% confident on the CI having given us the green-light on this, but I think if it's not too much hassle on your end, you could try switching branches before merging this.

We can ignore the coverage reduction failure for this one.

skinnider commented 1 month ago

got it, will do shortly. btw, I am still seeing version 'vsrc' printed to the command line whenever I run the Snakemake workflow. Was the conclusion ultimately that this doesn't really matter?

vineetbansal commented 1 month ago

for vsrc, I think we do want to investigate why it's happening. Let me open a new issue on this.

skinnider commented 2 weeks ago

@vineetbansal Sorry for the delay - I switched to vb/write_csv_onlydf and ran one of the commands from issue #234 that previously was failing:

/Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100
snakemake --configfile /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/config.yaml \
  --slurm --jobs 50 --default-resources slurm_partition=main,hoppertest,skinniderlab \
  --latency-wait=60 --rerun-incomplete

Took a while but it ran through without any issue. So, as far as I can tell, this is safe to merge.

As we discussed on Thurs - I also took a closer look at some of the Snakemake runs that were failing and it seems like most of the failing jobs are from the LinAlgError in calculate_outcomes. I admit I haven’t looked carefully, but wonder if it’s also possible that these issues are coming from the AddCarbon code? (e.g., if only a ~hundred rows of the AddCarbon file are being read in, and they all have the exact same TPSA, would this explain why the matrix decomposition is erroring?)