robmaz / distmap

Sequence alignment on Hadoop
0 stars 1 forks source link

Remove picard for mapping step #80

Closed magicDGS closed 6 years ago

magicDGS commented 6 years ago

Removes the usages of picard in the mapping step and cleaup the mapper-scripts, removing the argument --output-format (unused in all of them or unused the output file).

Closes #102 Part of #78

magicDGS commented 6 years ago

@robmaz - can you have a look to this? I guess that we can't really get rid of picard if it is necessary to sort the reads to convert into BAM.

robmaz commented 6 years ago

Probably not, but I have not looked at the more exotic mapper scripts in detail. The Big Three at least produce sam output, which gets gzipped, and then turned into bam only at the sorting/merging step (as you of course know).

Cheers Rupert

PS. I still find it fascinating how Ram never fixed any typo in a variable name, but rather kept it in all later occurrences.

2018-07-18 13:31 GMT+02:00 Daniel Gómez-Sánchez notifications@github.com:

@magicDGS commented on this pull request.

In bin/soap_mapping.pl https://github.com/robmaz/distmap/pull/80#discussion_r203343373:

@@ -144,6 +141,7 @@ }

if ($output_format =~ /bam/i) {

  • TODO: WARNING: this won't work anymore and will fail!!!

    my $cmd2 = "$hadoop jar $sartsam_jar I=$sam_output O=$bam_output SO=coordinate VALIDATION_STRINGENCY=SILENT";

@robmaz https://github.com/robmaz - do we really need this to convert to BAM?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/robmaz/distmap/pull/80#discussion_r203343373, or mute the thread https://github.com/notifications/unsubscribe-auth/Ad_FfP17vcRK4YUe1T5m3jtG9ZWoOXjhks5uHxyMgaJpZM4SVDM4 .

robmaz commented 6 years ago

Removing all the dead code paths in distmap is a project for itself and I would not really bother. I guess here it might make some sense, though, since we just removed the old jars.

Cheers Rupert

2018-07-18 14:12 GMT+02:00 Daniel Gómez-Sánchez notifications@github.com:

@magicDGS commented on this pull request.

In bin/soap_mapping.pl https://github.com/robmaz/distmap/pull/80#discussion_r203353673:

@@ -144,6 +141,7 @@ }

if ($output_format =~ /bam/i) {

  • TODO: WARNING: this won't work anymore and will fail!!!

    my $cmd2 = "$hadoop jar $sartsam_jar I=$sam_output O=$bam_output SO=coordinate VALIDATION_STRINGENCY=SILENT";

I just realized that in the rest of scripts, this lines were commented...and that the output_format was unused. Should I remove also here?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/robmaz/distmap/pull/80#discussion_r203353673, or mute the thread https://github.com/notifications/unsubscribe-auth/Ad_FfEzUUt3l98OaGM_2rRYUe_aEqpDFks5uHyYngaJpZM4SVDM4 .

magicDGS commented 6 years ago

Now this one is ready for review @robmaz - I think that this was the critical point to remove picard.

magicDGS commented 6 years ago

@robmaz - as it looks that you are trying the new distmap out, can you please have a look to this?