ocxtal / minialign

[IMPORTANT: not for real data analysis, only for algorithm evaluation] fast and accurate alignment tool for PacBio and Nanopore long reads
MIT License
126 stars 9 forks source link

Minialign ONT parameters #11

Open npavlovikj opened 6 years ago

npavlovikj commented 6 years ago

Hi,

I am comparing few nanopore aligners on ONT 1D and ONT 2D data, so I would like to verify if the general commands below are correct for those types of reads minialign -d ref_index.mai ref.fa minialign -x ont.r9.2d -O sam -T MD,AS,NM ref_index.mai input.fasta > output.sam minialign -x ont.r9.1d -O sam -T MD,AS,NM ref_index.mai input.fasta > output.sam or I need to specify some other parameters as well.

I would highly appreciate your input on this.

Thank you, Natasha

ocxtal commented 6 years ago

Sorry for being late. And thank you for testing minialign.

Everything seems correct if the input.fasta is Nanopore reads. If you want to change index parameters such as k-mer length (-k) and window size (-w), they must be specified when the index is created.

Thanks,

Hajime Suzuki

npavlovikj commented 6 years ago

Thanks @ocxtal ! My input data is nanopore reads, and I think I will use the default index parameters for now.

Another question - one of my genomes is circular, so is adding "-c '*'" to "minialign -d" enough?

ocxtal commented 6 years ago

Yes, the -c is only needed (and effective) when index is built. But you might need modify the argument because -c '*' marks all the sequences as circular. If you want to mark only specific ones such as mitochondria and chloroplast, -c chrM,chrC (comma-separated without space) would be more appropriate.

npavlovikj commented 6 years ago

and the sequence name after "-c" is the name of the reference sequence? For example, if one of my reference sequences I want to mark as circular is:

U00096.3 Escherichia coli str. K-12 substr. AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCA

I should use "-c U00096.3 Escherichia coli str. K-12 substr."?

I apologize for the question, but I didn't find much information about the proper syntax of "-c" in the manual.

ocxtal commented 6 years ago

In this case the correct argument will be -c U00096.3. The fasta/q parser first splits the name row with spaces, and recognize the first column as its name and the others as comments. The comments are together saved in CO:Z tag when -T CO option is specified. (I found there was a bug in the -T CO option and fixed it just now. Sorry for inconvenience.)

npavlovikj commented 6 years ago

This is really useful information - thank you so much @ocxtal !