Open haruosuz opened 1 year ago
Hi Haruo.
cat
command command produces a multi-fasta file by concatenating the 44 blaKPC-2/3 plasmids in the study. The test.fa
is a single fasta from this dataset.-cl
flag:
flanker --flank upstream --window 0 --wstop 5000 --wstep 100 --gene blaKPC-2 --fasta_file david_plasmids.fasta --include_gene -cl
which would give you a .csv
of flank clusters for each --wstep
i.e, 100bp, 200bp, ..., 5000bp.
Thank you for the advice.
https://flanker.readthedocs.io/en/latest/#clustering
I used the --cluster
flag with the following command to run Flanker:
flanker --fasta_file ${FASTA_FILE} --gene blaKPC-2 --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 --cluster
I have attached a zip file that contains the stdout, stderr, and output files generated by running my script with the following command:
FILE=./data/test.fasta
qsub -v fasta="$FILE" pbs.flanker.sh
In the stderr file
Error: Gene blaKPC-2 not found in MT560078.1
ERROR: could not open "/var/tmp/pbs.6848519.bias5-adm/tmp002m72ry/mash.msh" for reading.
The 50 output files <outblaKPC-2*> were empty and contained only the following single line:
assembly_1,cluster
It looks like that plasmid contains blaKPC-3, not blaKPC-2: https://www.ncbi.nlm.nih.gov/nuccore/MT560078.1/. I wonder whether this is the case for the others?
Perhaps run this instead:
flanker --fasta_file ${FASTA_FILE} --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 --cluster
which should grab either blaKPC-2 or blaKPC-3.
Thanks,
Will
Thank you for your suggestion. I used the --gene blaKPC
option instead of -gene blaKPC-2
, and provided the fasta file available at https://github.com/wtmatlock/flanker/blob/main/flanker/tests/data/test.fasta for the input fasta file --fasta_file ${FASTA_FILE}
.
I have attached a zip file that contains the stdout, stderr, and output files.
Upon reviewing the stderr file
ERROR: could not open "/var/tmp/pbs.6849120.bias5-adm/tmpgnonmznl/mash.msh" for reading.
I suspect that this may be the reason why all 50 output files <outblaKPC*> did not contain the results.
Dear Will,
I hope this message finds you well.
I was wondering if you have any advice or suggestions on how to resolve the ERROR: could not open "
issue mentioned above?
I would greatly appreciate any help you can provide.
Thank you for your time and assistance.
Best regards, Haruo Suzuki
Dear Haruo,
My apologies, I am writing up my thesis at the moment so a little stretched thin!
Just to check, is flanker writing the flanking sequences okay? To cluster, flanker looks for files in your current working directory that end with flank.fasta
.
Thanks,
Will
Thank you for your response, despite your busy schedule.
I have attached a zip file that contains the stdout, stderr, and output files generated by running the following commands:
flanker --fasta_file ${FASTA_FILE} --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100
flanker --fasta_file ${FASTA_FILE} --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 --cluster
Based on my understanding, the first command generated files that end with
Upon reviewing the stderr file
ERROR: could not open "/var/tmp/pbs.6865273.bias5-adm/tmpq82lfabj/mash.msh" for reading.
ERROR: could not open "/var/tmp/pbs.6865273.bias5-adm/tmp4jdf3be0/mash.msh" for reading.
Of the 50 assembly_1,cluster
), as follows:
$wc -l out_* | head -n 3
2501 out_blaKPC_0
1 out_blaKPC_100
1 out_blaKPC_1000
$head -n 3 out_blaKPC_0
assembly_1,cluster
ENA_CABFYD010000003_CABFYD010000003.1_blaKPC-3_3700_upstream_flank.fasta,0
ENA_CABFYI010000002_CABFYI010000002.1_blaKPC-3_3500_upstream_flank.fasta,0
$tail -n 3 out_blaKPC_0
MT560066.1_blaKPC-2_1100_upstream_flank.fasta,62
ENA_CABGBR010000009_CABGBR010000009.1_blaKPC-2_1000_upstream_flank.fasta,63
ENA_CABFYR010000005_CABFYR010000005.1_blaKPC-2_1000_upstream_flank.fasta,63
$head out_blaKPC_100
assembly_1,cluster
Hi Haruo,
I just ran the clustering mode as follows. First, I made test.fasta
containing the contigs CABFYD010000003.1
and CABFYI010000002.1
. Then, I ran
$flanker --fasta_file * --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 -cl
found 2 files
found 2 files
...
found 2 files
Checking via your method:
$wc -l out_* | head -n 3
3 out_blaKPC_0
3 out_blaKPC_100
3 out_blaKPC_1000
Indeed, it seems to have worked:
$cat out* | sed '/assembly/d' > all_out
$cat all_out | head -6
ENA|CABFYI010000002|CABFYI010000002.1_blaKPC-3_0_upstream_flank.fasta,0
ENA|CABFYD010000003|CABFYD010000003.1_blaKPC-3_0_upstream_flank.fasta,0
ENA|CABFYD010000003|CABFYD010000003.1_blaKPC-3_100_upstream_flank.fasta,0
ENA|CABFYI010000002|CABFYI010000002.1_blaKPC-3_100_upstream_flank.fasta,0
ENA|CABFYI010000002|CABFYI010000002.1_blaKPC-3_1000_upstream_flank.fasta,0
ENA|CABFYD010000003|CABFYD010000003.1_blaKPC-3_1000_upstream_flank.fasta,0
I would suggest first trying to reproduce the above. If that works, you should be good:)
Thanks,
Will
Dear Will,
Thank you for your previous response.
flanker --fasta_file ${FASTA_FILE} --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100
flanker --fasta_file ${FASTA_FILE} --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 --cluster
I have a question about the need for the first command that I used, which generated files ending with
I would suggest first trying to reproduce the above.
I failed to reproduce your results. Here are the steps I followed:
First, I made
seqkit grep -nrp "CABFYD010000003.1|CABFYI010000002.1" data/test.fasta > my_test.fasta
Then, I ran the clustering mode as follows.
$flanker --fasta_file my_test.fasta --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 -cl
This generated messages like the following:
found 2,102 files
ERROR: could not open "/tmp/tmp02lduhz7/mash.msh" for reading.
It appears that Flanker is experiencing difficulties accessing the
I have attached a zip file that contains the
I have the same question. In clustering mode, the same error as above will be prompted. Can you provide a docker container to enhance reproducibility?
Hi there,
I have tried again to reproduce this issue - see above your post - and am having no luck. What kind of system are you running it on?
Thanks,
Will
Hi, I'm using the ubuntu 22 system, on this system I always have problems installing, but when I use the centos system it has no errors, I don't know if it could be the system. But I have a suggestion, to avoid these kinds of errors, it might be better to release a docker image in the future.
Yes, we are working on a Docker install now, actually. Hopefully we will have it on the repo soon!
Hello,
https://flanker.readthedocs.io/en/latest/
I am having difficulty understanding some of the information in the documentation. In order to fully understand and reproduce the information, could you please provide more details about the
*fsa
files shown below?I am wondering if the
david_plasmids.fasta
file and the file located at https://github.com/wtmatlock/flanker/blob/main/flanker/tests/data/test.fasta are nearly identical?I came across a small typo in the command line argument for Flanker in the documentation. In the following command:
The "--wstop" and "--wstep" options should be written with a single dash instead of a double dash, as follows:
Using double dash instead of single dash with these options results in an error, as shown below:
Could you please provide the command to produce the
out*
files shown below?