qmarcou / IGoR

IGoR is a C++ software designed to infer V(D)J recombination related processes from sequencing data. Find full documentation at:
https://qmarcou.github.io/IGoR/
GNU General Public License v3.0
47 stars 25 forks source link

Continuing a random number sequence #16

Closed sinkovit closed 6 years ago

sinkovit commented 6 years ago

In our research, we anticipate the need to generate very large synthetic repertoires. Since this process can take a long time, it would be nice to have the ability to pick up the random number sequence where we left off so that the repertoire generation does not need to be done as a single compute job.

Although we can probably choose a new seed for each run - for a 64 bit random number generator it is highly unlikely that we would choose a seed that overlaps the previous sequence - it would be better to continue where we stopped.

This is a low-priority request and we are happy to assist.

qmarcou commented 6 years ago

Hi @sinkovit , Sorry for the long time it took me to answer this one. I originally looked into making this possible, however it would require to change quite a few functions for generating functions. I have conducted a small experiment using IGoR's new random seed generator using the following piece of code in the custom code section of the main:

    else{
        //Write your custom procedure here
        size_t n_seeds = 99999999;
        ofstream file ("/tmp/random_seeds.csv");
        for(size_t i=0;i!=n_seeds;++i){
            file<<draw_random_64bits_seed()<<endl;
        }
    }

Importing the corresponding file with pandas in python and making a histogram out of it with 5000 bins I get the following figure: seed_test

I think this shows that the seed generator is indeed uniformly random (1,84x10^19 is 2^64, the maximum value of a 64 bits integer) and getting the same seed twice is very unlikely. I have actually checked and all the 99999999 seeds were unique.

Given this information I think this is a won't fix (at least not in the near future)

Quentin

sinkovit commented 6 years ago

Hi Quentin,

Thanks for the follow up and I agree that this could be left off the list of software improvements.

By the way, I recently used IGoR to generate some repertoires containing on the order of one billion productive reads.

-- Bob

From: qmarcou notifications@github.com Reply-To: qmarcou/IGoR reply@reply.github.com Date: Saturday, August 25, 2018 at 12:03 PM To: qmarcou/IGoR IGoR@noreply.github.com Cc: "Sinkovits, Robert" sinkovit@sdsc.edu, Mention mention@noreply.github.com Subject: Re: [qmarcou/IGoR] Continuing a random number sequence (#16)

Hi @sinkovithttps://github.com/sinkovit , Sorry for the long time it took me to answer this one. I originally looked into making this possible, however it would require to change quite a few functions for generating functions. I have conducted a small experiment using IGoR's new random seed generator using the following piece of code in the custom code section of the main:

    else{

           //Write your custom procedure here

           size_t n_seeds = 99999999;

           ofstream file ("/tmp/random_seeds.csv");

           for(size_t i=0;i!=n_seeds;++i){

                   file<<draw_random_64bits_seed()<<endl;

           }

    }

Importing the corresponding file with pandas in python and making a histogram out of it with 5000 bins I get the following figure: [seed_test]https://user-images.githubusercontent.com/18257721/44621623-292a1b00-a877-11e8-9b4c-f7fe84b58eef.png

I think this shows that the seed generator is indeed uniformly random (1,84x10^19 is 2^64, the maximum value of a 64 bits integer) and getting the same seed twice is very unlikely. I have actually checked and all the 99999999 seeds were unique.

Given this information I think this is a won't fix (at least not in the near future)

Quentin

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/qmarcou/IGoR/issues/16#issuecomment-415989899, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AE_YMPqRhPag33A9y1CwEkYFOM45GxFzks5uUZ-NgaJpZM4VH60r.

qmarcou commented 6 years ago

Hi Bob, I have closed the issue but great to hear that you manage to produce so many reads! Out of curiosity: has this process taken a lot of computation time? Did you feel it was a bottleneck in your analysis? Do you think parallelizing random sequence generation is worth doing? It would most likely be limited by i/o time then. Best,

sinkovit commented 6 years ago

Hi Quentin,

I don’t think that you need to parallelize the sequence generation process. I just ran ten instances of IGoR in parallel using different seeds. Given the repeat length of the RNG, it’s highly unlikely that the different seeds would generate overlapping sets of random numbers. I was able to confirm by constructing rarefaction curves (number of unique sequences vs. total number of sequences).

-- Bob

From: qmarcou notifications@github.com Reply-To: qmarcou/IGoR reply@reply.github.com Date: Thursday, September 6, 2018 at 9:54 AM To: qmarcou/IGoR IGoR@noreply.github.com Cc: "Sinkovits, Robert" sinkovit@sdsc.edu, Mention mention@noreply.github.com Subject: Re: [qmarcou/IGoR] Continuing a random number sequence (#16)

Hi Bob, I have closed the issue but great to hear that you manage to produce so many reads! Out of curiosity: has this process taken a lot of computation time? Did you feel it was a bottleneck in your analysis? Do you think parallelizing random sequence generation is worth doing? It would most likely be limited by i/o time then. Best,

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/qmarcou/IGoR/issues/16#issuecomment-419166111, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AE_YMGIdSNTN3fi1oXlfJ_0Lq9jIsaK6ks5uYVM_gaJpZM4VH60r.