robmaz / distmap

Sequence alignment on Hadoop
0 stars 1 forks source link

Basic cleanup & code reformatting (perltidy defaults), removal of #1

Closed robmaz closed 6 years ago

robmaz commented 7 years ago

hard-coded paths hopefully done, ReadTools used for upload

magicDGS commented 6 years ago

I am on vacations and I can't review because I don't have a proper computer to do so. But several things that I saw while looking at your PR at the first glance:

  1. This PR addresses 2 different problems: tiding up and addition of ReadTools. My recommendation is to separate both into two different PRs to make easier the reviewing process. For example, this PR can maintain the tiding up, and a different one the substitution of the ReadTools
  2. Correct me if I am wrong, but my impression is that the new DistMap is going to use the readtools (and other software) in the PATH. I don't like to assume that ReadTools is installed by brew, because I personally prefer to use the jar file by now (and most people too) until I add a proper wrapper script to run it with several options.
  3. ReadTools should be checked for the version which includes the upload/download code (>=1.1.0), and remember that trimming in the cluster shouldn't be supported in the next DistMap (should be done locally with ReadTools, which have a different argument set).
robmaz commented 6 years ago

Hi Daniel,

thanks for checking in. I hope you are enjoiying your vacation days.

I am sorry about the confused PR, still getting the hang of this. In fact I only meant for you to look at the readtools up-/download implementation, which you basically cannot find now since every other line has changed due to the perltidying. The relevant changes are the argument processing in distmap itself, fastq2tab_{p,s}e_java in bin/DataProcess.pm, and the new bin/DataDownloadAndMerge.pm

I think it is a problem that the readtools jar is now so large, ~266 MB vs. the 5 MB of the old ReadTools.jar in distmap. Basically we cannot distribute it with distmap anymore, so it needs to be found somehow. I am not against a possibility to specify the jar if one does not have the brew installation, should be easy enough to add,

$readtools = "java -Xmx8g -Dsnappy.disable=true -jar $readtools" if ($readtools =~ /.jar$/);

Of course the "finding in the path" fallback method cannot work with a jar.

I want to keep the interface intact for the next version, so there needs to be default trimming and a --only-trim option. But I think we can write a deprecation warning with a pointer to the ReadTools command when trimming is used. Maybe I can also revert to the old perl script so that I can already remove the outdated jars; this should be functionally equivalent, no?

Cheers Rupert

2017-11-16 7:59 GMT+01:00 Daniel Gómez-Sánchez notifications@github.com:

I am on vacations and I can't review because I don't have a proper computer to do so. But several things that I saw while looking at your PR at the first glance:

  1. This PR addresses 2 different problems: tiding up and addition of ReadTools. My recommendation is to separate both into two different PRs to make easier the reviewing process. For example, this PR can maintain the tiding up, and a different one the substitution of the ReadTools
  2. Correct me if I am wrong, but my impression is that the new DistMap is going to use the readtools (and other software) in the PATH. I don't like to assume that ReadTools is installed by brew, because I personally prefer to use the jar file by now (and most people too) until I add a proper wrapper script to run it with several options.
  3. ReadTools should be checked for the version which includes the upload/download code (>=1.1.0), and remember that trimming in the cluster shouldn't be supported in the next DistMap (should be done locally with ReadTools, which have a different argument set).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/robmaz/distmap/pull/1#issuecomment-344833492, or mute the thread https://github.com/notifications/unsubscribe-auth/Ad_FfFFZdGi_RSiDIXznojX0jMbol-XFks5s291NgaJpZM4QcEpg .

magicDGS commented 6 years ago

@robmaz - please, open a different PR without the cleanup with the ReadTools change if you wanna my help there. In addition, the trimming should be modify in that PR too if the required version is higher or equal to 1.1.0

We can talk about it in person when I come back. Keep this PR only for cleanup until now (that is, revert all the changes made for ReadTools, including removal of JDistmap).

magicDGS commented 6 years ago

In addition, we can create an smaller ReadTools for distribution with DistMap without packaging Hadoop dependencies. But we should talk about it later... I guess that another idea is to create a java DistMap, but it will require more time (I have already a draft about it, but we should talk after you get familiar with the pipeline).