very different corrections when bootstrapping

kr-colab / ReLERNN

Recombination Landscape Estimation using Recurrent Neural Networks

MIT License

48 stars 12 forks source link

very different corrections when bootstrapping #45

Open jbruxaux opened 7 months ago

jbruxaux commented 7 months ago

Hi!

I used ReLERNN to estimate the recombination rate along a very long genome, and ran the analyses by pieces of 500Mb. The results between the different parts are comparable when I use the results of the "predict" function, but differ a lot after correction with the "bscorrect" function. For example, before correction: recombination_rate_1Mb_chr8_2 And after correction: Any idea what could cause such differences? Is there anything I should do? Thanks in advance!

andrewkern commented 4 months ago

interesting behavior here. is there any reason to believe that that portion of the chromosome is non-recombining?

jbruxaux commented 4 months ago

No, not at all. Each portion is 500Mb long (except the last one that is shorter) and harbor a lot of SNPs, and there is nothing different on any other parameter for that region (heterozygosity, nucleotide diversity...).

andrewkern commented 4 months ago

can you rerun the BSCORRECT module? I wonder if you will get a different answer?

jbruxaux commented 4 months ago

I will have to wait for a few days (GPU nodes in maintenance on our cluster), but I will try as soon as possible.

andrewkern commented 4 months ago

you might also try this on a cpu?

jbruxaux commented 3 months ago

Hi!

Sorry, it took me a few weeks to come back to this, but I re-ran the analysis on a smaller chromosome (only the BSCORRECT part), and I get the exact same result as last time (using the cmp command). So this part is at least consistent.

andrewkern commented 3 months ago

okay! this is very strange. any chance this has to do with seeds being maintained across runs? are you setting seeds manually?

jbruxaux commented 3 months ago

Yes, I used the same seed! I re-ran again with different seeds, and here is what I get (for a smaller chromosome). The original run: recombination_rate_1Mb_chr12_3 The new run: recombination_rate_1Mb_chr12_4 For reference, the uncorrected results: recombination_rate_1Mb_chr12_2

andrewkern commented 3 months ago

okay this is definitely a seed issue. when is the last time you pulled the code? we recently fixed a potential bug in commit #48 which may affect bootstrapping with a manual seed.

nspope commented 3 months ago

what version are you using @jbruxaux ? there was a bug fixed wrt to setting seeds a couple months ago (here) but you'll have to use the current github version of the repo

andrewkern commented 3 months ago

what version are you using @jbruxaux ? there was a bug fixed wrt to setting seeds a couple months ago (here) but you'll have to use the current github version of the repo

jinx

jbruxaux commented 3 months ago

A quick update. I re-ran the analysis from scratch without any seed (and with the last ReLERNN version). I still see this pattern, but not as strongly. Here are the non-corrected results: recombination_rate_1Mb_chr12_6

And here are the corrected results: recombination_rate_1Mb_chr12_5

What do you think? As you can see, my third part (from 1000 in the graph, so 1Gb in reality) shows almost no correction compared to the two others.

nspope commented 6 days ago

Hey @jbruxaux sorry for the long silence ... if you still have your script handy, could you please post the exact invocation for BSCORRECT? Were you using the default arguments; or a copy-paste from the examples/example_pipeline.sh (e.g. with --nSlice 2 --nReps 2); or something else?

If it's the copy-paste from example_pipeline.sh, then I think that's the issue -- two slices and two reps per slice is way to few (the defaults are 100 slices, 1000 reps per slice or thereabouts) which would make the correction very very noisy.

Thanks!

jbruxaux commented 6 days ago

I indeed used the example_pipeline.sh script, with --nSlice 2 --nReps 2. So that explains the issue. I will give it another try with more reasonable values. Thank you for your feedback!