sjdlabgroup / SAHMI

Other
68 stars 12 forks source link

errors when run the taxa_counts.r function #12

Open imMMQ opened 1 year ago

imMMQ commented 1 year ago

I have some trouble to run taxa_counts.r here is the error message.

[1] "Started extracting barcode data from fastq files for T117" Error in validObject(.Object) : invalid class “SRFilterResult” object: superclass "Mnumeric" not defined in the environment of the object's class Calls: [ ... SRFilterResult -> new -> initialize -> initialize -> validObject Execution halted

I'm not sure if it's due to the format of the sequencing file. So the first a few lines of the fa file are listed below.

A00728:798:HGCG7DSX5:3:1663:21649:36511 kraken:taxid|9606 TTTCTTATATGGGGGCCTCTCTGCGCCTGCGCCGGCGCGGCGCCTTTGCGACGGCGGAGTTGCGTTCTCCTCAGCACAGACCCGGAGAGCACCGCGAGGGCGGAGCTGCGTTCTCCTCTGCACAG A00728:798:HGCG7DSX5:3:1360:11306:28494 kraken:taxid|9606 TTTCTTATATGGGATTCCTGGGTTTAAAAGTAAAAAAATAAATATGTTTAATTTGTGAACTGATTACCATCAGAATTGTACTGTTCTGTATCCCACCAGCAATGTCTAGGAATGCCTGTTTCTCC A00728:798:HGCG7DSX5:3:2662:6677:2315 kraken:taxid|9606 TTTCTTATATGGGATCACCAGCTGCTCCGTTCTACCATTTCTTCAGCCCTCTTGGCTGTGCCTGCGGCTCTGCCCCTCCCGTCTCTGCACCTACCACCCAGAGAGGGCTTGTTGAGCTCAGAGAT

For the same concern of the sequencing file, I tried another sample, listed here.

SRR20330330.1 1 length=150 kraken:taxid|9606 TNTGGAGGTATATGGACCGTTTTATCCATAATTATTTTTAATATATTTTTTTTTTTTTAAATTACAAATACATATTATTAAAATTTTTATAAAACAGTACACAGAAGCCAAATTTTTACACATAATATAATAACCTACCAACACATTTTA SRR20330330.3 3 length=150 kraken:taxid|9606 GNCAGCCGTGAACCGATGGCTCTGATTCTAAAAAATAATAATATTTTTTTTTTTTTTTTGGGAGGCTGTGTGGGGGGGATCGGTTGCGCTCGGGAGTTTTAGCCCCCCCTGGGCACCATGGCAAAAACCCAACACAACAAAAAAAAAAAA SRR20330330.4 4 length=150 kraken:taxid|9606 ANTTTCCAGATGAAGGTTTCCAAATTTGAATTTTATTATGTATATTTTTTTTTTTCTTTTAGACTGAGTCTTTCTCTGTTGCCCCGGCTGGAGTCTAGTTGCGTTGTCTCGGCTCACTCAAACCCCCACCCTCCACGTACAACTGATTCT

while I got error message again, but not the same one as the first try.

[1] "Started extracting barcode data from fastq files for CA_HPV" Error in .Call2("XStringSet_letter_frequency", x, collapse, codes, baseOnly, : long vectors not supported yet: memory.c:3782 Calls: [ ... .local -> .XStringSet.nucleotide_frequency -> .Call2 Execution halted

there is no sample taxa file. I'm not sure if the taxa file I used for --taxa is correct. So the first couple of lines are listed here, just in case.

<link id=Main-File rel=Main-File href="file:////Users/rwang10/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip.htm"> <link rel=File-List href="file:////Users/rwang10/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_filelist.xml">

taxid.x | r | p | name | r1 | r2 | r3 | p1 | p2 | p3 | rank | CLrpmm | taxid.y | rpmm -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 27291 | 1 | 0 | Saccharomyces paradoxus | 0.8 | 0.6 | 0.8 | 0.33333333 | 0.41666667 | 0.33333333 | S | 939.562791 | 27291 | 6.67252069 28985 | 0.99998343 | 0 | Kluyveromyces lactis | 0.82857143 | 0.82857143 | 1 | 0.05833333 | 0.05833333 | 0.00277778 | S | 5744.24666 | 28985 | 218.858679 76775 | 0.8665019 | 1.02E-103 | Malassezia restricta | 0.57142857 | 0.89285714 | 0.53571429 | 0.2 | 0.01230159 | 0.23571429 | S | 14610.0636 | 76775 | 1698.82377

CongjiaMa commented 1 year ago

Dear Senior, may I ask your problem solving Lama? I am also learning this tool now. If it is convenient for you, could you please communicate with me via email (928508627@qq.com)? How did you resolve the error for q_df before running the final taxa.counts.r?

imMMQ commented 1 year ago

I tried a new taxa file with only names I wanted to extract. but I still got errors. here is the taxa.tsv taxa_test.tsv.txt

Here are the error massages. I tested 3 sequences of 3 patients, each of them coming from independent study. So the sequence formats look a little different. the fasta sequences of "CA_HPV" and "T117" after "Extract microbiome reads" step have been listed above. "PBMC" fasta has been listed as following, together with its error massage.

[1] "Started extracting barcode data from fastq files for CA_HPV" Error in h(simpleError(msg, call)) : error in evaluating the argument 'i' in selecting a method for function '[': long vectors not supported yet: memory.c:3887 Calls: [ ... .XStringSet.nucleotide_frequency -> .Call2 -> .handleSimpleError -> h Execution halted

[1] "Started extracting barcode data from fastq files for T117" Error in $<-.data.frame(*tmp*, umi, value = 1) : replacement has 1 row, data has 0 Calls: $<- -> $<-.data.frame Execution halted

The following is "PBMC" sequences and its error massage.

A00228:279:HFWFVDMXX:1:1101:4110:1063 1:N:0:ACATTACT kraken:taxid|9606 TGGGCTGGTCGCGGTTCATGGACATTCG A00228:279:HFWFVDMXX:1:1101:7509:1063 1:N:0:ACATTACT kraken:taxid|9606 CATGCTCGTCTCTCACACTTTTTGGCAA A00228:279:HFWFVDMXX:1:1101:15845:1063 1:N:0:ACATTACT kraken:taxid|9606 CACTGGGAGTTACTCGTTTTCTGTGGTT

[1] "Started extracting barcode data from fastq files for PBMC" Error in $<-.data.frame(*tmp*, umi, value = 1) : replacement has 1 row, data has 0 Calls: $<- -> $<-.data.frame Execution halted

CongjiaMa commented 1 year ago

谢谢,已收到

CongjiaMa commented 1 year ago

The taxa_count.r has some bug and l have change the script .This is the script and wish your good luck. As for the errors i think you can try to exam the data you deal above , open the file and you may find some different ways.

------------------ 原始邮件 ------------------ 发件人: "sjdlabgroup/SAHMI" @.>; 发送时间: 2023年8月25日(星期五) 凌晨3:44 @.>; @.**@.>; 主题: Re: [sjdlabgroup/SAHMI] errors when run the taxa_counts.r function (Issue #12)

I tried a new taxa file with only names I wanted to extract. but I still got errors. here is the taxa.tsv taxa_test.tsv.txt

Here are the error massages. I tested 3 sequences of 3 patients, each of them coming from independent study. So the sequence formats look a little different. the fasta sequences of "CA_HPV" and "T117" after "Extract microbiome reads" step have been listed above. "PBMC" fasta has been listed as following, together with its error massage.

[1] "Started extracting barcode data from fastq files for CA_HPV" Error in h(simpleError(msg, call)) : error in evaluating the argument 'i' in selecting a method for function '[': long vectors not supported yet: memory.c:3887 Calls: [ ... .XStringSet.nucleotide_frequency -> .Call2 -> .handleSimpleError -> h Execution halted

[1] "Started extracting barcode data from fastq files for T117" Error in $<-.data.frame(tmp, umi, value = 1) : replacement has 1 row, data has 0 Calls: $&lt;- -&gt; $&lt;-.data.frame Execution halted

The following is "PBMC" sequences and its error massage.

A00228:279:HFWFVDMXX:1:1101:4110:1063 1:N:0:ACATTACT kraken:taxid|9606 TGGGCTGGTCGCGGTTCATGGACATTCG A00228:279:HFWFVDMXX:1:1101:7509:1063 1:N:0:ACATTACT kraken:taxid|9606 CATGCTCGTCTCTCACACTTTTTGGCAA A00228:279:HFWFVDMXX:1:1101:15845:1063 1:N:0:ACATTACT kraken:taxid|9606 CACTGGGAGTTACTCGTTTTCTGTGGTT

[1] "Started extracting barcode data from fastq files for PBMC" Error in $<-.data.frame(tmp, umi, value = 1) : replacement has 1 row, data has 0 Calls: $&lt;- -&gt; $&lt;-.data.frame Execution halted

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

tuqiang2014 commented 5 months ago

step 7. Quantitation of microbes and creating the barcode-metagenome counts matrix. What format does parameter taxa need and where can I download it? thanks~