Closed jvanheld closed 3 months ago
Hi @jvanheld , commit https://github.com/rsa-tools/rsat-code/commit/87a220b2c40039d221d3ca412a0431dfb5fe9c41 adds FASTQ support to sub ReadNextSequence . So, a sample FASTQ file test.fq
can be converted to FASTA as follows:
$ cat test.fq
@m64268e_230602_135049/10/ccs np=8 rq=0.998228
ATGCTAAAGAAAAAGTAAAATAAAATTTAAGTAAACAAGTAAATAAAACACATGCATGCA
+
idzX}N~~c;t~~~t~I~~~~@~kfU~~U~}~S~syi~hYE~~p<~~|`hbtigD;f~\f
@m64268e_230602_135049/13/ccs np=3 rq=0.992424
TAAATGTATTTCTCCTCTATCTATTGTGGATTGGGTTTCGAAGTGAGGATAAGCAGAGGA
+
O?c_QRYW<GUQ>B`%JWXVQWXNcOJCOVH[0B@%3AQLIX>RSXFeXXM_QRH5O8^F
$ convert-seq -i test.fq -from fastq -to fasta
>@m64268e_230602_135049/10/ccs np=8 rq=0.998228
ATGCTAAAGAAAAAGTAAAATAAAATTTAAGTAAACAAGTAAATAAAACACATGCATGCA
>@m64268e_230602_135049/13/ccs np=3 rq=0.992424
TAAATGTATTTCTCCTCTATCTATTGTGGATTGGGTTTCGAAGTGAGGATAAGCAGAGGA
Please give it a try, Bruno
Great, you are faster than batman ! I will test it this evening and integrate it in a makefile.
So can you confirm this works as expected?
I had no chance to test it yet, but I intend to doit in the evening. For the time being I only treated two data types
Hi @brunocontrerasmoreira
The reading of fastq and fastq.gz works fine with convert-seq.
I however realized that peak-motifs
did not contain an option -seq_format.
I added it and submitted the update to github, but it will not be usable directly.
However, I can easily manage with the makfeile, by setting a condition. Indeed for genomic data, I have to use fetch-sequences in order to get fasta sequences from peak coordinates. I will just add a conditional statement so that if the input format is fastq.gz I use convert-seq, and if I have bed files with peak coordinates I run fetch-sequences.
So in any cases peak-motifs will take as input a fasta file.
I can generate a new Docker container on Monday...
In the meantime I treated sequence conversion in the makefiles, which is finally not so bad.
But it is definitely useful to have the possibility to specify sequence format in peak-motifs, to generalize its use.
Job done
The artificial High-Throughput selex (HTS) files are provided in fastq format. We need to add an option to
convert-seq
in order to accept fatsq format as input.