Closed zhouzhendiao closed 11 months ago
@zhouzhendiao Thank you for reaching out and providing detailed information regarding the issue you've encountered with the nf-core RNA-seq pipeline. It seems that there is a format incompatibility with the BED file you are using.
The error message you are seeing is indicative of a mismatch between the expected input format for the read_distribution.py
script and the format of your BED file. The script is expecting a BED file with at least six columns, where the sixth column contains strand information (either '+' or '-'). However, your BED file seems to have only four columns and does not include strand information, which is why the script fails with an IndexError
.
The nf-core RNA-seq pipeline typically requires a BED6 or BED12 format for accurate processing, where:
To resolve this issue, you will need to convert your 4-column BED file into a BED6 or BED12 format. This involves adding two additional columns for the score and strand. The score can often be set to a default value (such as 0
or .
) if not used, and the strand can be set to +
, -
, or .
if the strand information is not available.
Here is an example of how to format a BED6 file from your existing data:
chr1 14694 14814 ref|WASH7P 0 +
chr1 14928 15048 ref|WASH7P 0 +
...
Please ensure that the strand information (+
/-
) is accurate for your data. If you do not have this information, you may use .
as a placeholder, but be aware that this might affect downstream analysis that is strand-specific.
If you require assistance with converting your BED file to the appropriate format, there are tools available that can help automate this process, such as awk
for command-line text processing or more specialized bioinformatics tools.
Once you have a correctly formatted BED file, you should be able to rerun the pipeline without encountering the previous error.
Since this is unrelated to the functionality of this pipeline I'm closing the issue. Good luck with your project.
Hi @pinin4fjords ,
Thanks for your detailed tutorial!
Description of the bug
My RNA-seq datasets were captured by TrueSeq ranther than by PolyA, so I set parameter --gene_bed manually. The TrueSeq bed only has 4 columns:
Error occured when I run until RSEQC_READDISTRIBUTION step:
By default, ranseq pipeline will generate gene_bed from gtf file. The required bed format seems like BED6/BED12.Corresponding value of
fields[5]
is strand info(+/-).So how can I change the bed format?
Command used and terminal output
No response
Relevant files
No response
System information
No response