Closed alanhoyle closed 2 years ago
I had a look at this using the probeContentType()
method, but although it works OK for gzipped files it gives a null
result for bzipped files so it doesn't really work in this context. Shame really as this would have been a nice addition and easy to add, but I'm not going to spend a long time working out more robust auto-detection for something which is quite niche, sorry.
Thanks for taking a glance.
We have a few files that have been inaccurately named. E.g. instead of
blah.R1.fastq.bz2
, it might beblah.R1.fastq.bz
This causes FastQC to fail with the following error:
However, the files are properly formatted bzip2 fastq files, just with the wrong extension:
The problem occurs because FastQC/Sequence/FastQFile.java determines file type by looking at the file extension, and not using a smarter method that uses the actual file contents.
I would suggest looking at the java.nio.file.Files.probeContentType() method or Apache's Tika library to determine file MIME types instead of relying on the filenames being accurate.