yyoshiaki / VIRTUS2

A bioinformatics pipeline for viral transcriptome detection and quantification considering splicing.
Other
16 stars 6 forks source link

Extracted reads dont match on blast (nr) #38

Closed sirrgang closed 1 month ago

sirrgang commented 5 months ago

Thanks for the pipeline.

My virus output file looks like this:

rname startpos endpos numreads covbases coverage meandepth meanbaseq meanmapq

NC_002077.1_Adeno-associatedvirus-_1,_complete_genome 1 4718 514 394 8.351 10.8207 35.2 237 NC_015521.1_Cutthroat_trout_virus,_complete_genome 1 7310 4 43 0.588235 0.0225718 33.9 3 NC_004102.1_Hepatitis_C_virus_genotype_1,_complete_genome 1 9646 2979 276 2.86129 15.0274 33.2 195 NC_009823.1_Hepatitis_C_virus_genotype_2,_complete_genome 1 9711 2139 298 3.06868 11.6063 31.8 194 NC_009827.1_Hepatitis_C_virus_genotype_6,_complete_genome 1 9628 108 77 0.799751 0.568342 32.6 187 NC_001405.1_Human_adenovirus_C,_complete_genome 1 35937 6 52 0.144698 0.00659487 32.4 255 NC_002645.1_Human_coronavirus_229E,_complete_genome 1 27317 6 40 0.146429 0.00845627 32.5 255 NC_001798.1_Human_herpesvirus_2,_complete_genome 1 154746 24 154 0.0995179 0.00587414 33.8 22.7 NC_009333.1_Human_herpesvirus_8,_complete_genome 1 137969 2 47 0.0340656 0.00050736 34.3 1 NC_018464.1_Shamonda_virus_N_and_NSs_genes,_segment_S,_genomic_RNA,_isolate_Ib_An_5550 1 927 20 68 7.33549 0.942826 35.2 255 NC_010708.1_Thottapalayam_virus_segment_M,_complete_sequence 1 3621 2 41 1.13228 0.0196078 33.1 3 NC_001672.1_Tick-borne_encephalitis_virus,_complete_genome 1 11141 37 60 0.538551 0.153128 32 187 gi|9627396|lcl|HPV9REF.1|_Human_papillomavirus9(HPV9),_complete_genome 1 7434 32 47 0.63223 0.159941 32.6 160 gi|12084981|lcl|HPV71REF.1|_Human_papillomavirus71(HPV71),_complete_genome 1 8037 7 37 0.460371 0.0276222 33.4 255

Now i extracted the reads for hepatitis into a fasta file using samtools (first bam, then to fasta file). But when i now take one individual read and i blast it on ncbi - i am always getting 0 results... i guess anything i am miss understanding?

Example reads

A00182:966:HWKYVDSX5:1:2409:14534:30107/1 CCTTTCTTTTTTTTTTTTCTCTCTTTTTTTTTTTTTTTGGTTTC A00182:966:HWKYVDSX5:4:2160:13639:2879/1 CCTTTCTTTTTTTTTTTTTTTTTCTTTTTGTGTTTTGTTTC A00182:966:HWKYVDSX5:1:1229:9010:27837/1 ATCTTCCTTTTTTTTTTCTTTTTTTTTTTTTTTTTTTATTTTTTTG A00182:966:HWKYVDSX5:1:2455:2618:13260/1 ATCTTCCTTTTTTTTTTCTTTTTTTTTTTTTTTTTTTGTTTTTTTG A00182:966:HWKYVDSX5:2:2511:13847:28635/1 ATCTTCCTTTTTTTTTTCTTTTTTTTTTTTTTTTTTTGTTTTTTTG A00182:966:HWKYVDSX5:1:1549:8612:16783/1 CTTCTCTCTTTTTTTTTTTTTCCTTTTTTTTTTTTTTTTTTGG A00182:966:HWKYVDSX5:2:1537:3613:28228/1 TTTCTCTTTTTTCCTTTTTTTTTTTTTTTTTTATTTCTTTG A00182:966:HWKYVDSX5:2:2431:19488:34303/1 TTCTTTCTTTTTTTTTTTTTCTTTTTTTTTTTTCTCTTTGTTTTTTTTTTTTTTA

sirrgang commented 5 months ago

Figured it out: These are low complexity reads and the default attributes are filtering those from the result set on the blast website