thomasvangurp / epiGBS

Code for working with epiGBS data
MIT License
10 stars 7 forks source link

merge_watson_crick.py very slow #24

Open FleurGaBru opened 5 years ago

FleurGaBru commented 5 years ago

Hi Thomas!

Since your last commit on 25 January, the epiGBS pipeline has an increased running time by +/- 10 hours. It seems that this is due to changes that you made in the merge_watson_crick.py script. As far as I understand, you added several "if" statements to filter positions that contain "0" information and you included the contig positions. I do not fully oversee, why those changes were made? Are they worth the trade-off in time-investment? At the moment we keep on running an older version of the script. It would be great if you could have a look into that!

Best, Fleur

thomasvangurp commented 5 years ago

Hi Fleur,

This was done to make the script work with reference genomes. I guess this could be solves by using : Try: Int(watson_line[0]) Except ValueError: if watson_line[0] not in chrom_pos: chrom_pos.append(watson_line[0])

and try: int(watson_line[0]) watson_line[0] > crick_line[0]: Except ValueError: If: chrom_pos.index(watson_line[0]) > chrom_pos.index(crick_line[0]): read_watson = False read_crick = True

Could you test this and make a merge request?

Kind regards, Thomas Van: FleurGaBru notifications@github.com Beantwoorden - Aan: thomasvangurp/epiGBS reply@reply.github.com Datum: vrijdag 3 mei 2019 om 08:57 Aan: thomasvangurp/epiGBS epiGBS@noreply.github.com CC: Subscribed subscribed@noreply.github.com Onderwerp: [thomasvangurp/epiGBS] merge_watson_crick.py very slow (#24)

Hi Thomas!

Since your last commit on 25 January, the epiGBS pipeline has an increased running time by +/- 10 hours. It seems that this is due to changes that you made in the merge_watson_crick.py script. As far as I understand, you added several "if" statements to filter positions that contain "0" information and you included the contig positions. I do not fully oversee, why those changes were made? Are they worth the trade-off in time-investment? At the moment we keep on running an older version of the script. It would be great if you could have a look into that!

Best, Fleur

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/thomasvangurp/epiGBS/issues/24, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAJZU4WAR6FZHMF7JGMXBRDPTPO6PANCNFSM4HKRTJTQ.

FleurGaBru commented 5 years ago

Awesome, thanks for your explanation and suggestion! The reference compatibility is indeed a good one! We will try this and merge after testing.

thomasvangurp commented 4 years ago

@FleurGaBru any update on this?

FleurGaBru commented 4 years ago

I think Samar had a look at this but we did not solve it. As a work-around we use the merge_watson_crick script from before 25 January when running the pipeline in "de novo" mode and only use the newer version of the script in the reference branch. Do you think this is an appropriate way of doing it?

FleurGaBru commented 4 years ago

Btw: We (especially Maarten :) ) have done some other fixes, like fixing the "spiking" pattern in the methylation site coverage and an issue with missing the last reference sequence of a fasta file during mapping. Do you want them merged in your github?