merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
439 stars 145 forks source link

python 2/3? #343

Closed bcthomas closed 7 years ago

bcthomas commented 8 years ago

Hi Meren,

As far as I can tell from looking at the code base, all of your dependencies will work with python 3. Is there some reason you're staying with python 2 for anvio?

Cheers, and great seeing you a few weeks back in Berkeley!

bct

meren commented 8 years ago

Hi Brian,

Great to hear from you! I hope all is well in Berkeley.

Maybe there were some valid reasons at the very beginning, but there is no outstanding reason to not switch at this point since all I/O issues are resolved, and most of the libraries are ported to v3.

The only remaining issue is time :/ Doing it properly will take some effort and testing.

Would you vote for switching to 3?

Best,

bcthomas commented 8 years ago

I would always vote for switching to 3 (so would the python steering committee ;). It's not high priority, agreed, but more convenience at this point. But, from looking through the code in anvio, i think you're looking a almost exclusively print() fixes. Let me dig a little deeper in and I'll update.

bcthomas commented 8 years ago

@meren I found the following problem: clustering.py uses hcluster. This is not py3 ready (but I haven't looked at why). However, for hierarchical clustering, there are alternatives that could be used (or roll your own?).

I also found these: sge.py uses subprocess and utils.py uses multiprocess. These may be fine as is, but also you may have to recode some of their content (haven't looked yet).

Other that these, everything else looks to be a print() fix.

meren commented 8 years ago

hcluster is the fastest and most memory efficient hierarchical clustering solution I could find. It probably is not being actively maintained :/

The other potential issues do not seem to be very hard to solve, and it seems there aren't many print statements in the code:

$ find . -name '*.py' | xargs grep '    print' | grep -v 'lib\|sandbox\|pretty\|__init__' | wc -l
      14

If I find a something better for hierarchical clustering, I think the switch is very doable.

Best,

bcthomas commented 8 years ago

There are a few more areas that need to be addressed. I will investigate some more and report back. I've cloned the repo and I'll just see if I can make it work with py3.

gkmngrgn commented 8 years ago

@bcthomas hello, did you start for this task? This weekend I can try to run anvio with python3.

bcthomas commented 8 years ago

Yep, there ended up being a lot more to it. I have a few days of work invested in it.

Besides print(), there were many more simple fixes that 2to3 helped to identify. However several of the C-based extensions are taking me longer to complete and hcluster is going to take some more time to update too. I can try to wrap up my changes this weekend.

bct

On Apr 8, 2016, at 7:48 AM, Gokmen Gorgen notifications@github.com wrote:

This weekend I can try to run anvio with python3.

meren commented 8 years ago

Yep, there ended up being a lot more to it. I have a few days of work invested in it.

Oh, I didn't know you were spending time on this (I thought you had to move on to other stuff). I am sorry to hear it did cost you some time :(

Please let us know if there is anything @gkmngrgn or I can be of help!

Thank you both very much.

Best wishes,

meren commented 8 years ago

I will soon be working on the codebase to get rid of the hcluster dependency, and take care of all the clustering operations using SciPy.

After that it will be much easier to complete this probably.

meren commented 8 years ago

Done. We no longer have a dependency for hcluster. Some details are listed in #372.