yangao07 / TideHunter

TideHunter: efficient and sensitive tandem repeat detection from noisy long reads using seed-and-chain
MIT License
26 stars 4 forks source link

TideHunter with flags doesn't output different data format or filter data #11

Open mcrone opened 10 months ago

mcrone commented 10 months ago

Hi

Thanks for creating TideHunter, it is exactly what I need for a specific application that I'm working on. I've tried running the tool, but adding in the -m, -f, -l flags doesn't seem to make a difference to the final data output (-c does). I am not able to get anything other than a fasta output and I can't filter on the minimum length of the output.

I've tried using both the direct command line and the docker container.

This is my command: ./bin/TideHunter -u -l -m 1000 -c 2 -f 2 ./fastq/barcode43.fastq.gz > outputtest.out

This is the docker command: docker run -v "/Users/xxxxx/xxxxx/xxxxx/barcode43":/data quay.io/biocontainers/tidehunter:1.5.4--h43eeafb_2 TideHunter -u -l -m 1000 -c 3 -f 2 /data/barcode43.fastq.gz -o data/outputtest.out

Is there anything obvious that I am doing incorrectly?

yangao07 commented 10 months ago

"-f 2" should give you the tabular output, not fasta output. Can you paste some of the output results you have here?

mcrone commented 10 months ago

This works with the normal source on the repository. I've found that the docker version does not work (even when building the image from scratch).

When running the following command: docker run -v "/Users/xxxxxx/barcode43":/data quay.io/biocontainers/tidehunter:1.5.4--h43eeafb_2 TideHunter -u -f 2 /data/barcode43.fastq.gz --output data/outputtest.out

Docker doesn't seem to accept the use of '>'.

Either way, I end up with the following output: f3145286-baa6-4e70-89ad-44d859164aa2 rep0 sub0 AGTCAACAACACCGCCAGCAGGCCGCGCACAATGCGCCCTTCGCTGTCGCCAAAGAAATGCATTTTGCCGTTTTCAGCCACTGTATATCCCAGCCAGACGCGGTTTTCGCATCCGGCAATCTCTTTAGCCTGCGCTTTTAACTCGTCTGGCAATGCCGGAAGCTGTTTCCCCAGCATGATCAACTGGCGATATTTATCTTCCCATTGCGTGAACGGTGCGAAGGTATTGCGTAACGTTTCTGCGGTTACGGTTGTGCCGAACGGATGTCCGGCGAATTGCGGGTTTGTCATTAATCCACCAATAATTCCAGCGCGCGGTCAACGG

yangao07 commented 10 months ago

Actually, the output you show here is the expected format for "-f 2".

mcrone commented 10 months ago

It just doesn't have the number of repeats and all of the other information? It was also not filtering according to the -m tag, there were many results less than 1kb.

yangao07 commented 10 months ago

The other information are omitted because you specified "-u", which only output the repeat unit sequence.