Closed JeremyQuo closed 1 year ago
Hi @JeremyQuo,
Thank you reporting this bug. I fixed couple of bugs recently on dev branch and this sounds like one of them.
Can you please try the latest dev? I recommend cloning a new copy and using conda to create an env with python3.9 (instructions are available on readme dev).
Thanks for your information. I tried the last version on the dev branch. It improved the previous bug when I set the region with 1-100 or 1-200. But I find another bug when I set the region that covers the last part of my read.
For example, my reads query length is 10000 and the length of the move table is 9996, which means the first 4 will be dropped and the signal will only cover 9996 bases. So when I set 1-100, it will return 100-5 and it's correct.
When I set 9950-10000, it should return 10000-9950. And the first base should be the last base in my read. But the title showed it return to 9996-9950 and the first base is not the last base in my read its index is -4 not -1.
I don't know whether my understanding is correct or if there is any problem with the Squigualiser plot.
Thanks
Or I can say if it starts from 1+4, it should end in 9996+4
Hi @JeremyQuo,
I think I get what you are saying. I however, cannot reproduce the bug with my test data. Can you please send me a minimal dataset and the commands to reproduce this bug? I really appreciate your support.
Sorry for replying late.
Here is my test data and it should be can cover 6-620 but things seems not that correct.
When I set region 500-650,it can only cover 500-615,
When I want to set region 1-100, and bug appears.
squigualiser plot -f final.fastq -s file.blow5 -a file.paf -o result --rna --region 1-300
sequence file: final.fastq
alignment file: file.paf
signal file: file.blow5
Info: Signal to read method using PAF ...
plot region: 56-300 read_id: 562eeb47-2b86-4fc7-abfc-5dce62f511ed
BokehUserWarning: ColumnDataSource's columns must be of the same length. Current lengths: ('line_color', 8315), ('x', 8000), ('y', 8000)
BokehUserWarning: ColumnDataSource's columns must be of the same length. Current lengths: ('fill_color', 8315), ('line_color', 8315), ('x', 8000), ('y', 8000)
BokehUserWarning: ColumnDataSource's columns must be of the same length. Current lengths: ('fill_color', 8315), ('hatch_color', 8315), ('line_color', 8315), ('x', 8000), ('y', 8000)
plot region: 1-300 read_id: 562eeb47-2b86-4fc7-abfc-5dce62f511ed
Traceback (most recent call last):
File "/home/zhguo/Program/anaconda3/envs/venv3/bin/squigualiser", line 33, in
And I don't know why. Many thanks if can help. test_data.zip
I supposed the index problem is from the region option, which will be different logic with no region option. And the bug I met is because the insert, there should be some problem when tackle the related problem about insertion
@JeremyQuo,
Thanks a lot. I will get back to you asap.
Hi @JeremyQuo,
Thank you for reporting this bug. I have updated reform.py to support different stride values. Hopefully, this should have resolved the issue. I tested with your dataset as well. Please use the latest dev commit.
Hi @JeremyQuo,
Could you please let me know if this issue is resolved? I will close the issues for now. Feel free to reopen.
Thanks for your help. Recently I have been using squigualiser to check my sequencing data. But I find a bug about the nucleotide label in plotting.
Here is my command like,
squigualiser plot -f final.fastq -s file.blow5 -a out.paf -o new_read -r 0eaa68b9-5989-44ca-8e00-52eba6ba3ccb --rna --region 1-100
When I changed the region to 1-200, something wrong happened. The signal is correct and extends to 200 bases, but the base label is wrong. Because the direction is from 5 to 3, so for the last 100 bases, the x-axis from 200-1 and 100-1 will be the same. Actually, the first 100 base is the same but labeled the different signals.
I guess it's because you always labeled the first x-axis with the last base in the sequence, that's correct when plotting the whole sequence but the region. Thus I think a new method to label the nucleotide is required, such as using the code like sequence[-200:-1].