Closed kwonej0617 closed 1 year ago
Hi @kwonej0617 ,
As far as i can remeber, this step is quite fast. Can you tell me the size of your input file and the relevant command line?
Best, Huanle
@Huanle Thank you for your response.
The size of bam files that took a long time to process Slide_Variants is 937M and 1.4G. They took around ~96 hours.
Thank you.
thanks @kwonej0617 , do you mind sharing with me the input file to slide_variants?
Sure. You can use the following link to download the data. Please let me know if you are unable to access it. https://drive.google.com/drive/folders/1PlxuD0YLRN-U6tU4mHqagUGNyJ3OrBkM?usp=drive_link
The file is in gzipped format, but when I run slide_variants. I used the decompressed format.
Thank you so much for your help.
I have requested to download the data but have not been approved yet. Can you approve my request so that i can move forward?
Cheers - Huanle
On Mon, Jun 26, 2023 at 8:06 AM kwonej0617 @.***> wrote:
Sure. You can use the following link to download the data. Please let me know if you are unable to access it.
The file is in gzipped format, but when I run slide_variants. I used the decompressed format.
Thank you so much for your help.
— Reply to this email directly, view it on GitHub https://github.com/novoalab/EpiNano/issues/136#issuecomment-1606347601, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG57ELQ6PPUCVY3IFGQ74TXNDHB7ANCNFSM6AAAAAAY7HR37U . You are receiving this because you were mentioned.Message ID: @.***>
--
@Huanle I changed the access permisison if you use the link below. Could you please try it again? https://drive.google.com/file/d/1jwl56Q1WhhXUuRFvhO8x3Apmf9fHGa0D/view?usp=sharing
Thank you so much for your help!
Hello EpiNano developers,
$EPINANO_HOME/Epinano_Variants.py
is fairly fast (within a day) but then running slide variants to get kmers for the plus and minus strand sample_strand.per.site.csv
files is taking about 2 days each.python /path/to/EpiNano/misc/Slide_Variants.py sample.minus_strand.per.site.csv 5
.csv
's are about 5-600 MB each. Thank you for the assistance, Raj
Hi Raj,
Sorry for the late reply. I have been occupied by other tasks. Now i am working on it.
Best regards, Huanle
On Fri, Aug 18, 2023 at 5:48 AM Raj Doshi @.***> wrote:
Hello EpiNano developers,
- I am also wondering about this. Processing the bam file with $EPINANO_HOME/Epinano_Variants.py is fairly fast (within a day) but then running slide variants to get kmers for the plus and minus strand sample_strand.per.site.csv files is taking about 2 days each.
- The input .csv's are about 5-600 MB each.
- Is there any way to speed this up by providing more cores?
- Please, kindly let me know at your earliest convenience.
Thank you for the assistance, Raj
— Reply to this email directly, view it on GitHub https://github.com/novoalab/EpiNano/issues/136#issuecomment-1683023292, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG57EJZNUKSMOB4EYPT3C3XV2GSFANCNFSM6AAAAAAY7HR37U . You are receiving this because you were mentioned.Message ID: @.***>
--
@kwonej0617 I have committed a new version of slide_variants.py. Once it is admitted by the owner of this repo, you can give it a go.
Can you please test the new slide_variants script from EpiNano 1.2.3? Thanks
Hello Huanle,
Thank you for your help.
The new version has currently also been running for 2 days.
I started Slide_Variants.py
on Wednesday morning and it is still running as of Friday afternoon (PT).
I used the same command as before.
Let me know if there is an option I can use to specify the amount of threads/cores. Or if there is anything else I can do to speed this up.
Much appreciated, Raj
Hi @doshirLV ,
I do not think the script has been successfully committed to github. Therefore, I attached it here. Please change it a python script before using it. Let me know if you encounter any issues.
Best, Huanle
Dear Huanle,
This new version is much faster. It completes in less than an hour.
The new Slide_Variants.py
script should be committed to the EpiNano github page since it severely improves the tool.
Epinano_Variants.py
but it should say "slide_variants" instead, if I am not mistaken.sample.plus_strand.per.site.csv.non-consecutive-sites
. What is this file for? Is it used in any downstream steps (i.e. Epinano_Predict.py
)? Do I need any additional information from this file to do my analysis?With gratitude, Raj
Hi @doshirLV ,
Your note is right. It's to replace slide not var.
Regrading the non-consecutive-sites file, it contains information that can not be used to construct windows/kmers, because it does not have information associated with sites/positions right next to it. This means you can ignore/delete the file or comment out the script that generated it.
I will ask Eva to commit the relevant script.
Best, Huanle
Hi, Thank you for providing a useful tool. I have run Epinano with my dataset. I found the step of Slide_Variants took too much time to get the result. I was wondering if there is a threads or processor option that help reduce processing time and get the output faster.
Thank you!