jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
381 stars 80 forks source link

Duplicate row.names when using loadSQM #893

Closed SamBrutySci closed 1 month ago

SamBrutySci commented 1 month ago

Hi,

I have a project which I recently exported using sqm2zip and I'm trying to load into R, but whether I use the zip file or the original project folder I get this:

Proj1 <- loadSQM("Proj1.zip",tax_mode = "prokfilter",engine = "data.table") Loading total reads Loading orfs table... |--------------------------------------------------| |==================================================| Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘megahit_1_1-411’, ‘megahit_1_424-639’, ‘megahit_10_37-309’, ‘megahit_100_25-276’, ‘megahit_100_280-432’, ‘megahit_1000_2-532’, ‘megahit_10000_2-646’, ‘megahit_10000_713-838’, ‘megahit_100000_3-1235’, ‘megahit_1000000_1-378’, ‘megahit_1000001_2-457’, ‘megahit_1000002_2-370’, ‘megahit_1000003_287-439’, ‘megahit_1000004_2-448’, ‘megahit_1000005_1-318’, ‘megahit_1000006_3-536’, ‘megahit_1000007_1-423’, ‘megahit_1000008_2-409’, ‘megahit_1000009_136-345’, ‘megahit_100001_3-626’, ‘megahit_1000010_1-207’, ‘megahit_1000011_1140-1634’, ‘megahit_1000011_254-1132’, ‘megahit_1000011_3-257’, ‘megahit_1000012_1-630’, ‘megahit_1000013_2-673’, ‘megahit_1000014_12-560’, ‘megahit_1000015_114-527’, ‘megahit_1000015_3-113’, ‘megahit_1000016_3-398’, ‘megahit_1000017_1-309’, ‘megahit_1000018_3-308’, ‘megahit_1000019_3-410’, ‘megahit [... truncated]

Any ideas?

Thanks!

fpusan commented 1 month ago

Can you share the zip file with me? I can check

SamBrutySci commented 1 month ago

Thanks! Here's a link to the zip file in google drive, hopefully that works! I couldn't think of an easier way.

https://drive.google.com/file/d/1vqtEVuPLnbpq1MeksIrSayr2PiZEJX5y/view?usp=sharing

fpusan commented 1 month ago

Ok, somehow all orfs are present 4 times in your table, instead of once... Each line seems to contain reads for only one sample. An example for one ORF would look like:

                  Raw.read.count.JI0015.1 Raw.read.count.JI0015.2
megahit_1_1-411                         3                       0
megahit_1_1-411.1                       0                       2
megahit_1_1-411.2                       0                       0
megahit_1_1-411.3                       0                       0
                  Raw.read.count.JI0015.3 Raw.read.count.JI0015.4
megahit_1_1-411                         0                       0
megahit_1_1-411.1                       0                       0
megahit_1_1-411.2                       1                       0
megahit_1_1-411.3                       0                       1
                  Raw.read.count.JI0015.5 Raw.read.count.JI0015.6
megahit_1_1-411                         0                       0
megahit_1_1-411.1                       0                       0
megahit_1_1-411.2                       0                       0
megahit_1_1-411.3                       0                       0

Other elements of the table (e.g. taxonomic and functional annotation) are identical for the repeated ORFs ( as they should ) This is my first time seeing this, and it seems that the project was run with the latest version... @SamBrutySci did you do stop and restart this run somehow, or changed the parameters midway? @jtamames any insight on why this may be happening?

SamBrutySci commented 1 month ago

Yes the run was interrupted a couple of times by HPC upgrades taking nodes down! Parameters should have all been consistent when restarting each time however. I just restarted using the --restart flag

Is this fixable with the current run or shall I just re-run from a certain step?

fpusan commented 1 month ago

Samples JI0015.5 and JI0015.6 have no counts assigned to any ORF, so I suspect the run got interrupted during the mapping step. To be safe I would maybe restart from step 10, forcing overwrite.

SamBrutySci commented 1 month ago

Thanks so much for your help! Restarting at step 10 forcing overwrite has fixed the issue!