iqbal-lab-org / gramtools

Genome inference from a population reference graph
MIT License
92 stars 15 forks source link

100a/36a test is too slow #31

Closed iqbal-lab closed 7 years ago

iqbal-lab commented 8 years ago

Sorina, I tried shortening the 100a.txt PRG to 36a.txt but it still does not run overnight, even with -O3. Seems to me we'll get the same value from a shorter string - even down to 10a - otherwise the total number of substrings is just too big. OK with you? Otherwise, we could have a set of veryslow tests that run for weeks, but if we were doing that I might target other tests thanthis one

sm0179 commented 8 years ago

Are you sure it's not a bug somewhere? How can we map 80x of Plasmodium reads in 3 min and we can't map the substrings of 100a overnight?

iqbal-lab commented 8 years ago

The number of substrings of any length of a 100 character string is 100 + 100-choose-2 + 100-choose-3 +.....

sm0179 commented 8 years ago

errr - why combinations? aren't there 99 substrings of size 2, 98 of size 3, etc?

sm0179 commented 8 years ago

I think you were thinking of subsequences rather than substrings

iqbal-lab commented 8 years ago

Ah yes!

iqbal-lab commented 8 years ago

So in that case no idea why the test takes so long.

sm0179 commented 8 years ago

maybe good to print each substring as it gets mapped and see if there is one for which it freezes?

iqbal-lab commented 8 years ago

Yes, would be good to do. Let's raise a bug about it so we don't lose it, but doesn't seem very high priorty - we have a lot of tests of PRGs with no variation. I mean - compared with getting basic functionality of what is in the sites object unit tested, and the end-crash and the assert fail etc

iqbal-lab commented 8 years ago

Oh - this is a bug

ffranr commented 7 years ago

All unit tests are running quickly and passing.