lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
482 stars 133 forks source link

fix gap filling string in fastq file #110

Closed y9c closed 6 years ago

y9c commented 6 years ago

Description

change '-' to '!', cause '-' is meaningful.

Checklist

lindenb commented 6 years ago

@yech1990 back to you

y9c commented 6 years ago

sorry for make so many mistake.

I edit the code on web page...

lindenb commented 6 years ago

:+1:

y9c commented 6 years ago

@lindenb

I seems that '!' is the only solution to fill gap in qual.

space ' ' will induce error is some tools.

ValueError: Lengths of sequence and quality values differs

lindenb commented 6 years ago

why would you need a fastq at the end ? what are you trying to do ?

lindenb commented 6 years ago

I just added two new options to specify the characters to be using for padding and 'unknown'.

y9c commented 6 years ago

I use aligner for target sequencing data analysis. I need to stat the exact mutation on each read separately. Other tools like samtools mpileup will lost the lineage information of mutation in the same read, thus, sam4weblogo is the ideal choice. Meanwhile I need to generate a multiple sequence alignment file for phylogeny tree construction.

lindenb commented 6 years ago

I use aligner for target sequencing data analysis. I need to stat the exact mutation on each read separately.

use sam2tsv http://lindenb.github.io/jvarkit/Sam2Tsv.html

y9c commented 6 years ago

@lindenb

It is not the final solution of my analysis pipeline. Generating a msa file from bam is much faster than any MSA software, but there is still some bugs exsit, I am trying my best to figure out the problem.

y9c commented 6 years ago

I did use sam2tsv. It is powerful for read by read analysis. I use a python script to rebuild fastq file from sam2tsv result, until I found sam2weblogo