jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
365 stars 78 forks source link

Errors in loadSQM #597

Closed Lafmas closed 1 year ago

Lafmas commented 1 year ago

Hello

When I do loadSQM using my analysis, the error was occurred.

Loading total reads Loading orfs table... abundances... sequences taxonomy... Loading contigs table... abundances... sequences... taxonomy... binning info... Loading bins table... abundances... taxonomy... Loading taxonomies Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 11 elements Is that my raw analysis result had some problems? But my syslog file indicates the analysis was successfully done... (Have fun!)

The version of SqueezeMeta and SQMtools are latest (1.6.0), and I do R process in conda environment.

thank you

fpusan commented 1 year ago

Any chance you can share your project with us? I would need the following 1) SqueezeMeta_conf.pl file inside the project's directory. 2) results directory inside the project's directory. 3) intermediate directory inside the project's directory. Can you compress them and share with us via e.g. google drive or wetransfer?

Lafmas commented 1 year ago

Sure. I`m compressing them now.

I`ll comment here when I send the compress to you.

Is it compressed to .zip file OK?

thank you

Lafmas commented 1 year ago

OMG it has 35.5 GB data in my compress file.... it is too large to sharing my project by google drive or wetransfer using my account.

Could you give me another way for sharing my project? I`ll do my best.

thank you

Lafmas commented 1 year ago

Since raw data may be a problem, I restarted my project from the suspected step through -- force_overwrite option.

If it was working well, I`ll comment here again.

thank you

fpusan commented 1 year ago

35.5 Gb compressed seems a lot. Was this only the results and intermediate directories? Most of the disk usage should be spent in the data folder, which I don't need. Otherwise can you just send me the results/tables directory?

Lafmas commented 1 year ago

The file size in results/tables directory is still large.... about 26Gb

Is that right size?

fpusan commented 1 year ago

It would expect it to be at least an order of magnitude smaller. Can you get me detailed size of the files in that directory? du -h /path/to/project/results/tables

Lafmas commented 1 year ago

du -sh /analysis/users/wycho/Project/20221201_PLA-blending/squeezemeta/PLAblending/results/tables/ 26G /analysis/users/wycho/Project/20221201_PLA-blending/squeezemeta/PLAblending/results/tables/

here is it.

and ll -h results is below.

total 26G @@@ 4.0K Dec 11 04:51 .. @@@ 1.9M Dec 9 16:13 PLAblending.species.nofilter.abund.tsv @@@ 4.0K Dec 9 16:13 . @@@946K Dec 9 16:13 PLAblending.species.prokfilter.abund.tsv @@@ 917K Dec 9 16:13 PLAblending.species.allfilter.abund.tsv @@@ 689K Dec 9 16:12 PLAblending.genus.nofilter.abund.tsv @@@ 534K Dec 9 16:12 PLAblending.genus.prokfilter.abund.tsv @@@ 514K Dec 9 16:11 PLAblending.genus.allfilter.abund.tsv @@@ 279K Dec 9 16:11 PLAblending.family.nofilter.abund.tsv @@@ 235K Dec 9 16:11 PLAblending.family.prokfilter.abund.tsv @@@ 233K Dec 9 16:10 PLAblending.family.allfilter.abund.tsv @@@ 136K Dec 9 16:10 PLAblending.order.nofilter.abund.tsv @@@ 110K Dec 9 16:09 PLAblending.order.prokfilter.abund.tsv @@@ 109K Dec 9 16:09 PLAblending.order.allfilter.abund.tsv @@@ 63K Dec 9 16:09 PLAblending.class.nofilter.abund.tsv @@@ 50K Dec 9 16:08 PLAblending.class.prokfilter.abund.tsv @@@ 49K Dec 9 16:08 PLAblending.class.allfilter.abund.tsv @@@ 17K Dec 9 16:08 PLAblending.phylum.nofilter.abund.tsv @@@ 12K Dec 9 16:07 PLAblending.phylum.prokfilter.abund.tsv @@@ 12K Dec 9 16:07 PLAblending.phylum.allfilter.abund.tsv @@@ 341 Dec 9 16:07 PLAblending.superkingdom.nofilter.abund.tsv @@@ 297 Dec 9 16:06 PLAblending.superkingdom.prokfilter.abund.tsv @@@ 297 Dec 9 16:06 PLAblending.superkingdom.allfilter.abund.tsv @@@ 27K Dec 9 16:06 PLAblending.bin.tax.tsv @@@ 2.8G Dec 9 16:06 PLAblending.orf.tax.prokfilter.tsv @@@ 3.0G Dec 9 16:05 PLAblending.orf.tax.nofilter.tsv @@@ 8.2G Dec 9 16:04 PLAblending.contig.sequences.tsv @@@ 2.9G Dec 9 16:03 PLAblending.orf.sequences.tsv @@@ 2.1G Dec 9 16:02 PLAblending.contig.tax.nofilter.tsv @@@ 1.9G Dec 9 16:02 PLAblending.contig.tax.prokfilter.tsv @@@ 1.9G Dec 9 16:01 PLAblending.contig.tax.allfilter.tsv @@@ 2.8G Dec 9 15:59 PLAblending.orf.tax.allfilter.tsv @@@ 699K Dec 9 15:43 PLAblending.KO.tpm.tsv @@@ 645K Dec 9 15:43 PLAblending.KO.cov.tsv @@@ 584K Dec 9 15:43 PLAblending.KO.bases.tsv @@@ 331K Dec 9 15:43 PLAblending.KO.abund.tsv @@@ 2.5M Dec 9 15:43 PLAblending.KO.names.tsv

thank you

fpusan commented 1 year ago

That's a big project! Ok, I think I can manage with the files whose name starts by a taxonomic rank (species, genus, family, order, class, phylum, superkingdom). Those should be small enough.

Lafmas commented 1 year ago

I sent Gmail with attaching the file.

Thank you!

fpusan commented 1 year ago

I can reproduce the bug. 1) What command did you use to run SqueezeMeta? 2) Can you share your samples file with me? 3) Can you share the files results/06.*.fun3.tax.noidfilter.wranks, results/06.*.fun3.tax.wranks and results/19.*.contigtable ?

Lafmas commented 1 year ago
  1. here is my command when I initially started SqueezeMeta SqueezeMeta.pl -m coassembly -p PLAblending -s PB.2022.12.sample.name.tab -f ../seq/shotgun/ -t 50 --doublepass --nocog --nopfam --euk But now, Im restarting my project with nocog=0 , nopfam=0 (directly edit SqueezeMeta_conf.pl)

  2. what samples file do you want?

  3. I sent you those files by Gmail.

thank you!

fpusan commented 1 year ago

The samples file would be PB.2022.12.sample.name.tab

fpusan commented 1 year ago

Can you run sqm2tables.py manually and see if the bug persists? Inside the conda environment run sqm2tables.py /path/to/project /path/to/project/test_tables And then from the test_tables directory, paste the content of PLAblending.superkingdom.nofilter.abund.tsv here.

Lafmas commented 1 year ago

This is the content of PLAblending.superkingdom.nofilter.abund.tsv.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

  | PB_Blank_1 | PB_Blank_2 | PB_PLA_1 | PB_PLA_2 | PB_Methoxy_1 | PB_Methoxy_2 | PB_Ethoxy_1 | PB_Ethoxy_2 | PB_Acethoxy_1 | PB_Acethoxy_2 -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- k_Archaea | 656374 | 571406 | 673225 | 581865 | 513231 | 620746 | 579462 | 576583 | 476438 | 515757 k_Bacteria | 42398633 | 43327447 | 43494834 | 40417332 | 41103289 | 45207990 | 41923672 | 45421253 | 43848854 | 46493322 k_Eukaryota | 18957 | 18728 | 18751 | 16983 | 16834 | 19021 | 19280 | 19731 | 19914 | 22757 k_No CDS | 54302 | 55246 | 55936 | 51456 | 50787 | 56774 | 52073 | 57237 | 56074 | 60569 k_Sym plasmid (no superkingdom in NCBI) | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 k_Unclassified | 1862680 | 1783163 | 1907031 | 1734093 | 1680006 | 1867134 | 1729679 | 1865039 | 1715575 | 1837524 k_Unmapped | 32595477 | 33247575 | 33289354 | 31834384 | 30128838 | 33238846 | 31555242 | 33720422 | 33345034 | 34598364 k_Viruses | 5329 | 5193 | 5457 | 5459 | 6333 | 8053 | 21786 | 23987 | 19407 | 19403

thank you!

fpusan commented 1 year ago

this is promising. The first batch of tables you sent me seemed corrupted, as only the two first samples had abundances. But this table seems to be ok. you can try to replace the files in the results/tables directory with the files in test_tables 1) rm -r /path/to/project/results/tables 2) mv /path/to/project/test_tables /path/to/project/results/tables

And then try loadSQM again.

Lafmas commented 1 year ago

It works well. I'm now able to import my project using loadSQM .

And other functions also work well! (plotTaxanomy, plotFunction etc)

I really appreciate your dedication.

Thank you!

fpusan commented 1 year ago

Glad to hear, closing issue!