neherlab / pangraph

A bioinformatic toolkit to align genome assemblies into pangenome graphs
https://neherlab.github.io/pangraph
MIT License
87 stars 7 forks source link

Rust marco dev #66

Closed mmolari closed 8 months ago

mmolari commented 8 months ago

I fixed some bugs in the implementation of minimizer sketching and added nontrivial tests. The bugs were simply due to the fact that Julia has 1-based indexing, while rust is 0-based. I hope I catched caught all of them. The test were generated by running the original julia code on a random 100-nucleotide sequence, and getting the resulting minimizers.

ivan-aksamentov commented 8 months ago

Oh cool!

My only worry is that positions in the result look suspiciously like u64 max :) Did we hit some edge case here?

I'd merge, but I see that you've added bunch of directories from the Julia branch to gitignore. They are not relevant to Rust version and this might bite us later - I will definitely miss that some files did not get committed at some point in the future, and might loose some of the work progress.

Here is a potential alternative: I clone rust branch as a separate directory, this way I can have both Julia and Rust versions available at the same time, and I don't need to switch and worry about untracked directories specific to one or the other version. It's a complete rewrite, so switching the same local repo is very inconvenient. Do you think you can also adopt this strategy and revert the gitignore entries? Otherwise you could also try to add them to your local gitignore file called .git/info/exclude, where .git is the directory in the root of the checked out project.

mmolari commented 8 months ago

Thanks for all of the feedback!

I also added some more corrections. They are all due to the fact that:

mmolari commented 8 months ago

I also fixed some small bugs in the mash_distance function and added some non-trivial tests. I think we might have still some small conceptual problems here: