maickrau / ribotin

MIT License
30 stars 2 forks source link

Selection of rDNA tangle nodes #10

Closed jiadong324 closed 1 month ago

jiadong324 commented 10 months ago

Hi,

Thanks for providing the nice tool for rDNA analysis!

I got the verkko assembly and the assembly.homopolymer-compressed.noseq.gfa. I am wondering how to manually select rDNA tangles form that file.

Looking forward to your reply! Thanks!

maickrau commented 10 months ago

Hi, you can first open the assembly graph in bandage (https://github.com/rrwick/Bandage) and find the rDNA tangles. When you've found the rDNA tangles, you can select the nodes in bandage and then copy-paste them from the "Selected nodes" in the right side of Bandage into a file, with one file per tangle. Then give the tangle files to ribotin-verkko with the parameter -c, for example if you have two tangles in files tangle1.txt and tangle2.txt then add the parameters -c tangle1.txt -c tangle2.txt.

jiadong324 commented 7 months ago

Hi Mikko,

I am using ribotin-verkko in the latest version. Here is the command ribotin-verkko -x human -i $asm_dir -o $asm_dir/ribotin and I got error as below:

terminate called after throwing an instance of 'std::logic_error'
what():  basic_string: construction from null is not valid

Looking forward to your reply! Thanks!

maickrau commented 7 months ago

Could you share the full command and log output?

jiadong324 commented 7 months ago

Here is the full command

module load verkko/2.0

asm_dir=./assembler_acros/verkko_v2/NA12877 

verkko -d $asm_dir --hifi $( cat $asm_dir/fastq.fofn ) --nano $( cat $asm_dir/ont_read.fofn ) --hap-kmers ./meryl/NA12877/MATERNAL_all.compress.k30.meryl ./meryl/NA12877/PATERNAL_all.compress.k30.meryl trio --snakeopts "-j 50 -k --restart-times 1" --screen human

module load ribotin/1.3
ribotin-verkko -x human -i $asm_dir -o $asm_dir/ribotin

Verkko's outputs are good, and log for ribotin-verkko:

ribotin-verkko version bioconda 1.3
checking for MBG
/software/modules-sw/ribotin/1.3/Linux/CentOS7/x86_64/bin/MBG
checking for GraphAligner
/software/modules-sw/ribotin/1.3/Linux/CentOS7/x86_64/bin/GraphAligner
checking for liftoff
/software/modules-sw/ribotin/1.3/Linux/CentOS7/x86_64/bin/liftoff
do UL analysis: yes
output prefix: /assembler_acros/verkko_v2/NA12877/ribotin
guessing tangles
terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string: construction from null is not valid
Aborted
maickrau commented 7 months ago

Looks like there might be some issue with ribotin trying to load the template sequences when ribotin is in a module. Could you try downloading the template sequences folder https://github.com/maickrau/ribotin/tree/master/template_seqs somewhere and then adding the parameters

--guess-tangles-using-reference <templatefolder>/chm13_rDNAs.fa --orient-by-reference <templatefolder>/rDNA_one_unit.fasta --annotation-reference-fasta <templatefolder>/rDNA_one_unit.fasta --annotation-gff3 <templatefolder>/rDNA_annotation.gff3

with <templatefolder> replaced by where you downloaded the template sequences

jiadong324 commented 7 months ago

Here is my command:

ribotin-verkko -x human -i $asm_dir -o $asm_dir/ribotin --guess-tangles-using-reference ./ribotin/template_seqs/chm13_rDNAs.fa --orient-by-reference ./ribotin/template_seqs/rDNA_one_unit.fasta --annotation-reference-fasta ./ribotin/template_seqs/rDNA_one_unit.fasta --annotation-gff3 ./ribotin/template_seqs/rDNA_annotation.gff3

I got the same error:

ribotin-verkko version bioconda 1.3
checking for MBG
/software/modules-sw/ribotin/1.3/Linux/CentOS7/x86_64/bin/MBG
checking for GraphAligner
/software/modules-sw/ribotin/1.3/Linux/CentOS7/x86_64/bin/GraphAligner
checking for liftoff
/software/modules-sw/ribotin/1.3/Linux/CentOS7/x86_64/bin/liftoff
do UL analysis: yes
output prefix: /assembler_acros/verkko_v2/NA12877/ribotin
terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string: construction from null is not valid
Aborted

It looks like that ribotin cannnot find the file for construction. Here are the files under $asm_dir:

drwxrwsr-x 6 jdlin eichlerlab        4096 Mar 26 17:33 0-correction
drwxrwsr-x 2 jdlin eichlerlab        4096 Mar 27 02:38 1-buildGraph
drwxrwsr-x 2 jdlin eichlerlab        4096 Mar 27 15:14 2-processGraph
drwxrwsr-x 3 jdlin eichlerlab        8192 Mar 27 13:04 3-align
drwxrwsr-x 3 jdlin eichlerlab        4096 Mar 27 13:56 3-alignTips
drwxrwsr-x 2 jdlin eichlerlab        8192 Mar 27 15:14 4-processONT
drwxrwsr-x 2 jdlin eichlerlab        4096 Mar 27 15:15 5-untip
drwxrwsr-x 2 jdlin eichlerlab        4096 Mar 27 16:19 6-layoutContigs
drwxrwsr-x 2 jdlin eichlerlab        4096 Mar 27 16:06 6-rukki
drwxrwsr-x 3 jdlin eichlerlab        4096 Mar 28 09:17 7-consensus
-rw-rw-r-- 1 jdlin eichlerlab     1529131 Mar 27 16:06 assembly.colors.csv
-rw-rw-r-- 1 jdlin eichlerlab    19140523 Mar 28 09:16 assembly.disconnected.fasta
-rw-rw-r-- 1 jdlin eichlerlab           0 Mar 28 09:16 assembly.ebv.exemplar.fasta
-rw-rw-r-- 1 jdlin eichlerlab           0 Mar 28 09:16 assembly.ebv.fasta
-rw-rw-r-- 1 jdlin eichlerlab  6804259509 Mar 28 09:18 assembly.fasta
-rw-rw-r-- 1 jdlin eichlerlab  2499260672 Mar 28 09:18 assembly.haplotype1.fasta
-rw-rw-r-- 1 jdlin eichlerlab  2402573516 Mar 28 09:18 assembly.haplotype2.fasta
-rw-rw-r-- 1 jdlin eichlerlab      817662 Mar 28 09:18 assembly.hifi-coverage.csv
-rw-rw-r-- 1 jdlin eichlerlab  3909042110 Mar 28 09:18 assembly.homopolymer-compressed.gfa
-rw-rw-r-- 1 jdlin eichlerlab   607435189 Mar 28 09:18 assembly.homopolymer-compressed.layout
-rw-rw-r-- 1 jdlin eichlerlab     4896500 Mar 28 09:18 assembly.homopolymer-compressed.noseq.gfa
-rw-rw-r-- 1 jdlin eichlerlab       16871 Mar 28 09:16 assembly.mito.exemplar.fasta
-rw-rw-r-- 1 jdlin eichlerlab       88917 Mar 28 09:16 assembly.mito.fasta
-rw-rw-r-- 1 jdlin eichlerlab      807000 Mar 28 09:18 assembly.ont-coverage.csv
-rw-rw-r-- 1 jdlin eichlerlab      964923 Mar 27 16:06 assembly.paths.tsv
-rw-rw-r-- 1 jdlin eichlerlab       46211 Mar 28 09:16 assembly.rdna.exemplar.fasta
-rw-rw-r-- 1 jdlin eichlerlab    11331119 Mar 28 09:16 assembly.rdna.fasta
-rw-rw-r-- 1 jdlin eichlerlab      793086 Mar 28 09:18 assembly.scfmap
-rw-rw-r-- 1 jdlin eichlerlab  1902425321 Mar 28 09:18 assembly.unassigned.fasta
-rw-rw-r-- 1 jdlin eichlerlab 39763943318 Mar 26 17:33 hifi-corrected.fasta.gz
-rw-rw-r-- 1 jdlin eichlerlab        1574 Mar 21 09:40 ont_read.fofn
-rwxrwxr-x 1 jdlin eichlerlab         462 Mar 28 11:32 snakemake.sh
-rw-rw-r-- 1 jdlin eichlerlab        7153 Mar 28 11:32 verkko.yml
jiadong324 commented 1 month ago

The problem is solved by adding -x human -x -c and the parameters for template sequence.