open2c / quaich

snakemake pipeline for Hi-C post-processing
MIT License
22 stars 6 forks source link

Feature added (data tables): get_cool_data and get_bed_data rules. #2

Closed agalitsyna closed 3 years ago

agalitsyna commented 3 years ago
Phlya commented 3 years ago

Awesome, thanks a lot Sasha!

Two questions.

Do you think local cooler files should be copied from specified locations? I like the tidiness and reproducibility of that, but they were not copied before to avoid data duplication. I think Max previously wanted to run quaich on loads of cool files, and wanted to avoid having to copy them - which I understand, high resolution cooler files from deep sequenced experiments take up quite a lot of space. An option is to try hard-linking instead of copying, if the user prefers that. Soft linking doesn't play well with HDF5 for some reason.

Have you tried running it? You fixed some ugly whitespace I had to introduce, because for whatever reason snakemake wouldn't run for me with nice and tidy tabs. Does it run for you?

agalitsyna commented 3 years ago
  1. Good point, I've removed the opportunity to copy.

  2. I don't run the pipeline fully since I don't need mustache and many other things. Basically, it runs without syntax errors. What error do you get?

To make it aligned with snakemake conventions, I've run the code through snakefmt. You can check it now and improve the lines that cause inconsistency in your system.

  1. With the sample CTCF bed from the OSF file, I get a strange error from coopup.py. When the pipeline runs for you, please, test the default tables. Is this something unnatural to have float instead of int as a score in bed file?

Namespace(anchor=None, baselist='inputs/beds/CTCF_test1.bed', bed2=None, bed2_ordered=True, by_window=False, coolfile='inputs/coolers/test.mcool::resolutions/1000', coverage_norm=False, excl_chrs='chrY,chrM', expected=None, ignore_diags=2, incl_chrs='all', local=False, logLevel='INFO', maxdist=None, maxshift=1000000, maxsize=None, mindist=None, minshift=100000, minsize=None, n_proc=5, nshifts=1, outdir='results/pileups/beds', outname='test1-1000_over_CTCF_test1.bed_1-shifts.np.txt', pad=100, post_mortem=False, rescale=False, rescale_pad=1.0, rescale_size=99, save_all=False, seed=None, subset=0, unbalanced=False, weight_name='weight') Traceback (most recent call last): File "/home/agalicina/anaconda3/envs/quaich/bin/coolpup.py", line 8, in sys.exit(main()) File "/home/agalicina/anaconda3/envs/quaich/lib/python3.8/site-packages/coolpuppy/main.py", line 409, in main CC = CoordCreator( File "/home/agalicina/anaconda3/envs/quaich/lib/python3.8/site-packages/coolpuppy/coolpup.py", line 353, in init self.process() File "/home/agalicina/anaconda3/envs/quaich/lib/python3.8/site-packages/coolpuppy/coolpup.py", line 704, in process self.bases, self.kind = self.auto_read_bed(self.baselist) File "/home/agalicina/anaconda3/envs/quaich/lib/python3.8/site-packages/coolpuppy/coolpup.py", line 389, in auto_read_bed int(row1[4]), ValueError: invalid literal for int() with base 10: '10.240026710177574'

Phlya commented 3 years ago

Cool, I couldn't run snakefmt because of this: https://github.com/snakemake/snakefmt/issues/89

Right, coolpuppy doesn't expect any columns apart from chrom, start end in a bed file, because it uses the number of columns to guess whether it's a bed or bedpe file, so it has to be strictly 3 or 6 columns. I guess our test file happened to have 6 columns, but they were not correct for bedpe...

agalitsyna commented 3 years ago

Ah,force=Trueonly enforces the logging regime. If you remove this argument, it won't change the output of snakefmt...

Phlya commented 3 years ago

Yeah I know, but I couldn't be bothered to change the source code, just let them fix it and then will install when the PR is merged.

agalitsyna commented 3 years ago

Okay, maybe we will transition to Python 3.8 before they fix this single argument. Does the formatted version work with your snakemake environment?

Phlya commented 3 years ago

I'll give it in a bit and will report, thanks!

Phlya commented 3 years ago

It runs for me, thanks! Need to sort out the columns of the test CTCF file then...

Phlya commented 3 years ago

I pushed some small changes to master that broke the merging. I think I fixed the conflicts though.

Phlya commented 3 years ago

I think this is good to merge, any thoughts @agalitsyna?

Phlya commented 3 years ago

I added another feature that I forgot to push earlier, but will just merge now. Thanks again!