ythuang0522 / homopolish

High-quality Nanopore-only genome polisher
GNU General Public License v3.0
65 stars 12 forks source link

using virus.msh for HSV-1 assembly polishing and having less than 5 related-genomes for polish #41

Open rezaeir opened 2 years ago

rezaeir commented 2 years ago

I am trying to use homopolish to improve the assembly consensus after using Raven for assembly and Medaka for primary polishing. However, when I use the virus.msh file and input my consensus.fasta file as input, the output is that: ``

homopolish polish -m R10.3.pkl -a consensus.fa -s virus.msh -t 10 -o hpOut [2021/10/20 13:51] INFO: RUN-ID: consensus consensus /home/rezaeir/sequencing_data/121021_FusionHSVLibrary/fastq/fuHSV7/hpOut/debug [2021/10/20 13:51] INFO: Stage: Select closely-related genomes TIME Select closely-related genomes: 0 MINS 0 SECS. This contig consensus closely-related genome is less than 5, not to polish... TIME Total: 0 MINS 0 SECS. ``

Is there any way that I can fix this? Also, I tried using the "-g" option with "humanalphaherpesvirinae_humanalphaherpesvirus1" as the genius_species input but I am not sure if this is the right way to write it (the result did not have any changes compared to Medaka's output).

ythuang0522 commented 2 years ago

@rezaeir Can you provide one HSV-1 genome to us? First, we found the number of HSV-1 is insufficient in current virus sketch version and would like to update it. Second, the -g option was designed for bacteria but can be revised for virus. Finally, we are considering pulling more genomes from NCBI virus instead of RefSeq. Would be great if you can provide one for revising and testing.

rezaeir commented 2 years ago

@ythuang0522 Unfortunately the number of HSV-1 genomes that are in NCBI is very limited. This https://www.ncbi.nlm.nih.gov/nuccore/NC_001806.2?report=fasta is refSeq file that you can access. Also I attached the file that I generated using ViPR containing more than 80 full genomes of HSV- HSV1_ViPR_DB.zip 1.

rezaeir commented 2 years ago

I used that ViPR database and it seems that it could marginally decrease the number of gaps (from 126 to 117).

ythuang0522 commented 2 years ago

Thanks for your response. I mean if you can provide the the viral genome after Medaka polishing for developing. I was not aware of ViPR. It looks to me you may use the -l (local database) for polishing. However it's still lacking the ANI selection step. We are considering adding this into the local DB version. As we don't have ONT viral genome at hand, would be better if you could provide one.

rezaeir commented 2 years ago

Sorry for the very late response. The following file is my sequencing with Minion R9.4 of HSV genome with a GFP insertion in its Tk gene locus. RR-tkHSV.raven.medaka.zip

steinbrl commented 1 year ago

I have the same problem. I like to to polish HSV-1, HSV-2, VZV, KSHV and HCMV-assemblies. I started with HSV-2, and it directly failed. Same issue...

SeaneryChang commented 1 year ago
@rezaeir We have polished the virus by -l (local database) with your HSV DB and tested some thresholds. Mismatch and insdel are accessed by fastmer (compare pre-polish and after-polish files). homopolished_1 is the default result which equals yours and we are curious how you got the gap. We would appreciate it if you could provide a reference of the virus for us to adjust our program. mismatch insertion deletion
homopolished_1 0 75 65
homopolished_2 0 66 62
homopolished_3 0 72 73

hsv.zip

SeaneryChang commented 1 year ago

@steinbrl Hi, you can use -l (local database) to polish if you have the virus database. If the program can't find the closer virus in our database, it would skip it because of the insufficient homogeneous virus. It would be great if you can provide your assemblies and database(if you have one).

rezaeir commented 1 year ago

@rezaeir We have polished the virus by -l (local database) with your HSV DB and tested some thresholds. Mismatch and insdel are accessed by fastmer (compare pre-polish and after-polish files). homopolished_1 is the default result which equals yours and we are curious how you got the gap. We would appreciate it if you could provide a reference of the virus for us to adjust our program.

mismatch insertion deletion homopolished_1 0 75 65 homopolished_2 0 66 62 homopolished_3 0 72 73 hsv.zip

Hi, I've attached a reference fasta file from HSV-GFP which is an assembly from very high depth short read sequencing. I was wondering if you plan to add an internal virus database maybe based on NCBI virus? hsv1-gfp-genome.txt