loculus-project / loculus

An open-source software package to power microbial genomic databases
https://loculus.org
GNU Affero General Public License v3.0
34 stars 1 forks source link

In ingest, populate the version comment field with a diff versus the previous version #2804

Open corneliusroemer opened 3 days ago

corneliusroemer commented 3 days ago

In ingest, we should add the diff between previous and new version to the version comment field. Ideally we'll do this before we merge:

theosanderson commented 3 days ago

As Emma sort of mentioned in the chat on the call - I don't think an automated diff is desirable for version comments. I think these should be descriptive (like commit messages) and should describe intent. So a useful message here would be something like Author order used to be automatically resorted by Pathoplexus ingest pipeline, now instead it reflects INSDC order (see github.com/bla). I appreciate figuring out how to do that isn't necessarily easy, but if we can't do it I still wouldn't go for an automated diff as this message - I think if we want that we should do it with the comparison feature.

corneliusroemer commented 3 days ago

I think ingest is special in this case, as it's automated. It's true that it'd be nicer to have a human description there when the change is due to ingest code changes. But a diff isn't a problem if the changes are due to upstream changes?

You're right that a diff would make the comment less useful.

But I don't know how to automatically add such a custom comment in ingest - we could do it manually in the db? Or we do it with configuration passed to ingest? We run it once with the version comment, then turn the comment off a few minutes later?

theosanderson commented 3 days ago

Yes, I was just thinking that a manual DB operation (pseudocode) UPDATE SET JSON.versionComment = 'bla' WHERE date_submitted IN LAST 3 HOURS would probably work pretty well.

I think in the case of upstream changes - which are not the case here! - then a useful automated comment might be Authors field updated on INSDC or something

corneliusroemer commented 3 days ago

Right - so we are not opposed to version comment in principle, just in this case here it's best to add better context than a diff would give.

chaoran-chen commented 3 days ago

As we plan to implement a diff viewer, I think that something that indicates that a change has been made automatically due to an update on INSDC makes sense. This would distinguish it from manual curation changes.