kbaseattic / KBaseFBAModeling

Flux balance modeling service. Deploys both the client and service codes.
MIT License
6 stars 11 forks source link

string comparison in build_sequence_object #21

Open levinas opened 9 years ago

levinas commented 9 years ago

Is this line trying to sort sequences before computing an MD5? Should the operator be 'cmp' ?

https://github.com/kbase/KBaseFBAModeling/blob/master/lib/Bio/KBase/fbaModelServices/Impl.pm#L2457

mmundy42 commented 9 years ago

To me, it looks like the operator should be 'cmp' instead of ‘<=>’ since the sequence field of the contigobject is a string and not a number.

I’m not sure the impact of changing the operator. The _build_sequence_object() method is building either a ContigSet or ProteinSet object from a user provided fasta file of DNA sequences or protein sequences. The MD5 of all of the sequences is used to generate an ID for the new object using the register_ids() method from the ID server. It does not seem like that should be an issue since the object IDs are just a number.

The MD5 is also saved in the object — maybe that is used by the workspace service’s versioning support to understand when an object has changed and a new version needs to be created? I suppose there is a chance that a MD5 generated with the updated sort order could match a MD5 generated with the current sort order and cause the same ID to be returned and the workspace service to create a new version of an existing object instead of a new object.

I’m not seeing any impacts from the array of sequences being in a different order but maybe I’m missing something subtle.

Also, the cmp operator is used when doing the same processing on line 816 for building a ContigSet for a SEED genome and line 1000 for RAST genome.

Mike Mundy

From: Fangfang Xia notifications@github.com<mailto:notifications@github.com> Reply-To: kbase/KBaseFBAModeling reply@reply.github.com<mailto:reply@reply.github.com> Date: Friday, November 21, 2014 at 5:10 PM To: kbase/KBaseFBAModeling KBaseFBAModeling@noreply.github.com<mailto:KBaseFBAModeling@noreply.github.com> Subject: [KBaseFBAModeling] string comparison in build_sequence_object (#21)

Is this line trying to sort sequences before computing an MD5? Should the operator be 'cmp' ?

https://github.com/kbase/KBaseFBAModeling/blob/master/lib/Bio/KBase/fbaModelServices/Impl.pm#L2457

— Reply to this email directly or view it on GitHubhttps://github.com/kbase/KBaseFBAModeling/issues/21.

levinas commented 9 years ago

I agree. I think this should be "cmp" and changing this is unlikely to break anything.