[discussion] Is the distributed version control approach through memote relevant?

opencobra / memote

memote – the genome-scale metabolic model test suite

https://memote.readthedocs.io/

Apache License 2.0

128 stars 28 forks source link

[discussion] Is the distributed version control approach through memote relevant? #258

Open ChristianLieven opened 7 years ago

ChristianLieven commented 7 years ago

Problem description

I don't believe this framework is general enough to be appropriate for the wide range of modeling tools that the community will develop. For example, Pathway Tools uses a database to store its models during development, not a text file, so a version control system is of no use to us. It is important to de-couple model distribution from model development. One might use SBML for model distribution but not use it for model development. In fact, does anyone really use SBML for model development? Do people hand edit XML files when developing models? This seems unlikely to me; people should be using higher level tools to develop their models that emit SBML when the model is ready to be published or distributed. So again I really don't see the distributed version control approach as relevant.

Peter Karp, comment on the manuscript

I can only speak from my personal experience, which, granted, might be quite limited in comparison to that of other co-authors.

However, for my thesis' main project I started from an automated draft. Before I was able to begin with the curation, I had to make several global changes to the .xml file, which I did by editing it directly. Then I curated the model in sets of cohesive steps with cobrapy, at the end of each set I would export the model to SMBL.

I agree that version control of a database is no simple task. As far as I'm aware Pathway Tools supports an SBML export. So in theory one could apply the same workflow of emitting intermittent versions of the database as SBML, which would make version control feasible.

We had hoped that by choosing SBML as the primary format we would stay agnostic to the differences between each tool. Since memote doesn't introduce any changes to a model, a re-import is unnecessary. Could you explain why you think it may not be decoupled?

[...] I would like to move this discussion to GitHub so that we can discuss this with the whole community. I think it would be quite valuable to learn exactly what the users habits are, and then take it from there.

My initial responses on the manuscript

pkarp111 commented 7 years ago

Yes, I see your point that some people may choose to generate SBML from their model development tool and put that within a version control system, however, that approach is quite problematic. The reason is that when a given tool generates an SBML export, there is no guarantee that portions of the file will be in the same order as in the previous export. Hence, a small edit to the model could yield a file in which the majority of its contents have been scrambled from the previous version, yielding a huge number of differences and making the version control system useless.

ChristianLieven commented 7 years ago

That's very true and a problem we've deliberated over for quite some time. Currently, we address this issue by auto-generating a sorted YAML file though memote for each commit at which the original SBML file has been modified. This circumvents the issues with scrambling, while also providing a 'fairly' readable file-diff.

You can see an example here: https://github.com/ChristianLieven/memote-demo/commit/3e64be6e1a39bc5aa9ed1f4159cba3fd62b26b80

For a short time we also considered to customize the git-compare view, but we found that this works well enough. However, further testing is definitely necessary to arrive at a solution which runs comfortably fast on each commit. (And there are still some minor bugs with it too https://github.com/opencobra/memote/issues/183)

pkarp111 commented 7 years ago

Could you explain that a bit more? What is the relationship of the YAML file to the SBML file? Why use a different file format?

ChristianLieven commented 7 years ago

Sure! Although @Midnighter implemented the conversion in cobrapy using the ruamel package, and might therefore do a better job explaining technical details.

What is the relationship of the YAML file to the SBML file?

YAML is essentially a cleaner JSON-format

When compared with XML it is much less verbose as you can see here

Could you explain that a bit more? So internally what happens:

A pre-commit hook is triggered which

imports the SBML at this point with cobrapy's read_sbml_model() function.

and then exports it by running cobrapy's save_yaml_model() function

This will yield a sorted YAML file

Why use a different file format?

In short, to make human-readable, line-based diffing possible. You may also find this discussion insightful.

rmtfleming commented 6 years ago

There is no doubt that SBML is the standard for model interchange. However, as alluded to by others , it is not clear that a structured text file can outcompete a well-designed database when dealing with changes and updates to different versions of a model. As such, git on top of a text file is one implementation approach for version control of a model, which is laudable, and necessary in principle, but not the only means, nor necessarily the optimal approach to the problem. It may be, however, the one that is most immediately accessible to a broad community, and that could be the main strength of the way memote implements version control of a reconstruction.

phantomas1234 commented 6 years ago

Exactly that 👉 It may be, however, the one that is most immediately accessible to a broad community, and that could be the main strength of the way memote implements version control of a reconstruction.. Of course databases should be preferred over handling data in files. They come at a cost though as can be seen with biocyc having moved to a subscription model recently. With the GitHub-based workflow (doesn't cost a dime), which is actually a small part of memote, we're just trying to provide a very practical solution to a problem. Is it the optimal approach? Probably not. In lack of a well-designed and freely accessible database for model reconstruction, this point is moot. If anyone is already tracking model changes in the reconstruction process in a database system, please let us know. We'd be happy to work with you in extending memote appropriately for integration purposes.

bdelepine commented 6 years ago

@phantomas1234 What do you think about MetExplore?

pkarp111 commented 6 years ago

Just to be clear, yes, BioCyc has moved to a subscription model, but the underlying software, Pathway Tools, is still free to academics. It offers a free, database-based version control system for metabolic models.

Peter Karp

On 3/16/18 12:08 PM, Nikolaus Sonnenschein wrote:

Exactly that 👉 |It may be, however, the one that is most immediately accessible to a broad community, and that could be the main strength of the way memote implements version control of a reconstruction.|. Of course databases should be preferred over handling data in files. They come at a cost though as can be seen with biocyc having moved to a subscription model recently. With the GitHub-based workflow (doesn't cost a dime), which is actually a small part of memote, we're just trying to provide a very practical solution to a problem. Is it the optimal approach? Probably not. In lack of a well-designed and freely accessible database for model reconstruction, this point is moot. If anyone is already tracking model changes in the reconstruction process in a database system, please let us know. We'd be happy to work with you in extending memote appropriately for integration purposes.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopencobra%2Fmemote%2Fissues%2F258%23issuecomment-373815741&data=01%7C01%7Cpeter.karp%40sri.com%7C83829d9fec1b42d165e408d58b716417%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=W%2Fu6nZsz1z0EIlcTE6p0Oj1XVE78hgLM2MO3sCbpIJY%3D&reserved=0, or mute the thread https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABBc4-CALyyN01KHF9QY0ts0amDuXy2Yks5tfA3HgaJpZM4QIp2-&data=01%7C01%7Cpeter.karp%40sri.com%7C83829d9fec1b42d165e408d58b716417%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=Q8Cd5YEfCjNiPgyox45I5glkXK9PkS11zpzKd0ON1EU%3D&reserved=0.

matthiaskoenig commented 6 years ago

Hi all,

I am a very strong supporter of a github based approach using structured text files (one can still have a database backend working in sync with the text files to get the best of both worlds). This is in my opinion the only way to get a community curated model and community curation on models. Otherwise it is very complicated to get changes/fixes into models. See for instance RECON3D where I found some errors but there is no clear platform for reporting this issues or even fixing this. With a text based github approach this fixes could be immediatly available to the community via a pull request

I would love to fix things like the following reported issues in RECON3D https://github.com/SBRG/bigg_models/issues/291 https://github.com/SBRG/bigg_models/issues/290 https://github.com/SBRG/bigg_models/issues/287

After more than 20 days of reporting the first issue of a incorrect gene in RECON3D these are still not fixed in the underlying database http://vmh.uni.lu/#reaction/HDCA24Gtr Whereas in text file via pull request everybody could be immediatly working with an updated Recon v3.02.

The only way to get the community to work together and use the curation potential of all scientist working with models is a fully open easy to update format. Github with structured text file is an established and working approach.

I really hope the community is working into this direction instead of hiding information in in-house databases.

Best Matthias

On Fri, Mar 16, 2018 at 9:17 PM, pkarp111 notifications@github.com wrote:

Just to be clear, yes, BioCyc has moved to a subscription model, but the underlying software, Pathway Tools, is still free to academics. It offers a free, database-based version control system for metabolic models.

Peter Karp

On 3/16/18 12:08 PM, Nikolaus Sonnenschein wrote:

Exactly that 👉 |It may be, however, the one that is most immediately accessible to a broad community, and that could be the main strength of the way memote implements version control of a reconstruction.|. Of course databases should be preferred over handling data in files. They come at a cost though as can be seen with biocyc having moved to a subscription model recently. With the GitHub-based workflow (doesn't cost a dime), which is actually a small part of memote, we're just trying to provide a very practical solution to a problem. Is it the optimal approach? Probably not. In lack of a well-designed and freely accessible database for model reconstruction, this point is moot. If anyone is already tracking model changes in the reconstruction process in a database system, please let us know. We'd be happy to work with you in extending memote appropriately for integration purposes.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://na01.safelinks.protection.outlook.com/?url= https%3A%2F%2Fgithub.com%2Fopencobra%2Fmemote%2Fissues% 2F258%23issuecomment-373815741&data=01%7C01%7Cpeter.karp%40sri.com% 7C83829d9fec1b42d165e408d58b716417%7C40779d3379c44626b8bf140c4d5e 9075%7C1&sdata=W%2Fu6nZsz1z0EIlcTE6p0Oj1XVE78hgLM2MO3sCbpIJY%3D&reserved=0,

or mute the thread https://na01.safelinks.protection.outlook.com/?url= https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABBc4- CALyyN01KHF9QY0ts0amDuXy2Yks5tfA3HgaJpZM4QIp2-&data=01%7C01% 7Cpeter.karp%40sri.com%7C83829d9fec1b42d165e408d58b716417% 7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata= Q8Cd5YEfCjNiPgyox45I5glkXK9PkS11zpzKd0ON1EU%3D&reserved=0.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opencobra/memote/issues/258#issuecomment-373833010, or mute the thread https://github.com/notifications/unsubscribe-auth/AA29uqJMootlI1gHVYDHdV16GSbd9qZrks5tfB31gaJpZM4QIp2- .

-- Matthias König, PhD. Junior Group Leader LiSyM - Systems Medicine of the Liver Humboldt Universität zu Berlin, Institute of Biology, Institute for Theoretical Biology https://livermetabolism.com konigmatt@googlemail.com https://twitter.com/konigmatt https://github.com/matthiaskoenig Tel: +49 30 20938450

pkarp111 commented 6 years ago

Hello, Technically speaking, your comments are clearly incorrect. See below. I think you would do better to try to compare and contrast the strengths and weaknesses of different technical approaches than to make false claims about one approach being the only way to go.

On 3/19/18 1:44 AM, Matthias König wrote:

Hi all,

I am a very strong supporter of a github based approach using structured text files (one can still have a database backend working in sync with the text files to get the best of both worlds). This is in my opinion the only way to get a community curated model and community curation on models. The ONLY way? Clearly that statement is incorrect. If a model is stored in a database, multiple users can connect to that database to update it. We have had such a multi-user database update situation working for approximately 20 years, such as using our EcoCyc database. Obviously this is not even new technology. Otherwise it is very complicated to get changes/fixes into models. See for instance RECON3D where I found some errors but there is no clear platform for reporting this issues or even fixing this. With a text based github approach this fixes could be immediatly available to the community via a pull request

I would love to fix things like the following reported issues in RECON3D https://github.com/SBRG/bigg_models/issues/291 https://github.com/SBRG/bigg_models/issues/290 https://github.com/SBRG/bigg_models/issues/287

After more than 20 days of reporting the first issue of a incorrect gene in RECON3D these are still not fixed in the underlying database http://vmh.uni.lu/#reaction/HDCA24Gtr Whereas in text file via pull request everybody could be immediatly working with an updated Recon v3.02. I'm sorry to hear that RECON3D has not implemented your suggested fixes. That is irrelevant to the question of whether a git-based approach is the only approach.

The only way to get the community to work together and use the curation potential of all scientist working with models is a fully open easy to update format. Github with structured text file is an established and working approach. Yes, it is an established and working approach, but to say it is the only approach is incorrect. There are other established and working approaches, like a database approach.

You speak here of the need for "a fully open easy to update format". The XML-based formats used by many model developers are not easy to update and are not at all git-friendly. That is, if two developers end up committing conflicting changes, deciphering and resolving the git conflict messages will be a nightmare; probably it will be beyond the capabilities of many model builders to understand these messages. Therefore one could argue that the approach you propose is not workable if resolving conflicts becomes impossible because you are mis-using source-code control systems for complex file formats they were not designed to work with.

I really hope the community is working into this direction instead of hiding information in in-house databases. It is strange that you use the phrases "hiding information" and "in-house databases". Yes, to update the shared database, a user would have to be granted access to the database. Just as to update a shared git repository, a user has to be granted access to the repository. So one could just as well accuse a git-based project of "hiding information in a private repository".

Best Matthias

On Fri, Mar 16, 2018 at 9:17 PM, pkarp111 notifications@github.com wrote:

Just to be clear, yes, BioCyc has moved to a subscription model, but the underlying software, Pathway Tools, is still free to academics. It offers a free, database-based version control system for metabolic models.

Peter Karp

On 3/16/18 12:08 PM, Nikolaus Sonnenschein wrote:

Exactly that 👉 |It may be, however, the one that is most immediately accessible to a broad community, and that could be the main strength of the way memote implements version control of a reconstruction.|. Of course databases should be preferred over handling data in files. They come at a cost though as can be seen with biocyc having moved to a subscription model recently. With the GitHub-based workflow (doesn't cost a dime), which is actually a small part of memote, we're just trying to provide a very practical solution to a problem. Is it the optimal approach? Probably not. In lack of a well-designed and freely accessible database for model reconstruction, this point is moot. If anyone is already tracking model changes in the reconstruction process in a database system, please let us know. We'd be happy to work with you in extending memote appropriately for integration purposes.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://na01.safelinks.protection.outlook.com/?url= https%3A%2F%2Fgithub.com%2Fopencobra%2Fmemote%2Fissues% 2F258%23issuecomment-373815741&data=01%7C01%7Cpeter.karp%40sri.com% 7C83829d9fec1b42d165e408d58b716417%7C40779d3379c44626b8bf140c4d5e

9075%7C1&sdata=W%2Fu6nZsz1z0EIlcTE6p0Oj1XVE78hgLM2MO3sCbpIJY%3D&reserved=0>,

or mute the thread https://na01.safelinks.protection.outlook.com/?url= https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABBc4- CALyyN01KHF9QY0ts0amDuXy2Yks5tfA3HgaJpZM4QIp2-&data=01%7C01% 7Cpeter.karp%40sri.com%7C83829d9fec1b42d165e408d58b716417% 7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata= Q8Cd5YEfCjNiPgyox45I5glkXK9PkS11zpzKd0ON1EU%3D&reserved=0.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opencobra/memote/issues/258#issuecomment-373833010, or mute the thread

https://github.com/notifications/unsubscribe-auth/AA29uqJMootlI1gHVYDHdV16GSbd9qZrks5tfB31gaJpZM4QIp2- .

-- Matthias König, PhD. Junior Group Leader LiSyM - Systems Medicine of the Liver Humboldt Universität zu Berlin, Institute of Biology, Institute for Theoretical Biology https://livermetabolism.com konigmatt@googlemail.com https://twitter.com/konigmatt https://github.com/matthiaskoenig Tel: +49 30 20938450

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopencobra%2Fmemote%2Fissues%2F258%23issuecomment-374138435&data=01%7C01%7Cpeter.karp%40sri.com%7Ca78282391f354eb9a9d508d58d75b556%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=qtxqg077wcbEm1%2FI23P79T4K7VE%2FklRaq6Lt%2FBZMn4k%3D&reserved=0, or mute the thread https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABBc4zRBJ2tvb3S49seECWykffHWWeTMks5tf3ACgaJpZM4QIp2-&data=01%7C01%7Cpeter.karp%40sri.com%7Ca78282391f354eb9a9d508d58d75b556%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=oOh%2FihoOO%2FcCfrE3Jeo8Ttefx7XARvoSARgPzZIr9zA%3D&reserved=0.

matthiaskoenig commented 6 years ago

Hi Peter,

my formulation was probably too harsh. Of course database systems can work, but until now I have not seen a flexible enough open implementation which works for me. But this is clearly a personal opinion.

I just find it very difficult to get information in such databases. For instance I want to add a citation to the following entry, i.e. a third citation next to [Fukumoto88] ? https://biocyc.org/gene?orgid=HUMAN&id=HS08885 How would I do this? In a text based format with a git backend I would select "edit entry", commit the change directly or via a pull request, and the responsible person could merge the change. How would this work in the case of HumanCyc? How are you tracking changes to the entries in the database? Is there a complete log of changes by whom (for attribution)? Both the changing of information and attribution are "for free" with a git based system.

Just as to update a shared git repository, a user has to be granted access to the repository. So one could just as well accuse a git-based project of "hiding information in a private repository".

This is not true. Everybody can make a pull request, no login, user data, account or anything needed. You can promote users to trusted commiters, but everybody can propose changes in a simple manner.

The XML-based formats used by many model developers are not easy to update and are not at all git-friendly.

I completely agree. They are good as exchange formats, but not suited to track changes.

I really hope the community is working into this direction instead of hiding information in in-house databases.

Yes, this happens independent of the the used technology, but is dependent on the license of the data. There are open databases like Reactome (CCO) and there are private repositories and vica versa. Unfortunately, this is the impression I get when looking at the subscription model behind HumanCyc. Why should I curate an entry in HumanCyc if these is subsequently licensed for someone else and only accessible after others pay?

Best Matthias

pkarp111 commented 6 years ago

Dear Matthias,

On 3/20/18 11:07 AM, Matthias König wrote:

Hi Peter,

my formulation was probably too harsh. Of course database systems can work, but until now I have not seen a flexible enough open implementation which works for me. But this is clearly a personal opinion.

I just find it very difficult to get information in such databases. For instance I want to add a citation to the following entry, i.e. a third citation next to [Fukumoto88] ? https://biocyc.org/gene?orgid=HUMAN&id=HS08885 https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbiocyc.org%2Fgene%3Forgid%3DHUMAN%26id%3DHS08885&data=01%7C01%7Cpeter.karp%40sri.com%7C953cdd1a1c754efd065e08d58e8d7bbb%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=GTlvjVHFBlyyIw6xufsgYOCo%2B0xkc%2BzjxfpAXb9SWH0%3D&reserved=0 How would I do this? In a text based format with a git backend I would select "edit entry", commit the change directly or via a pull request, and the responsible person could merge the change. How would this work in the case of HumanCyc?

We have a GUI editing interface within Pathway Tools. There is a different forms-based editing tool for every datatype we support, e.g., for proteins, reactions, metabolites, pathways, etc. For each one of those datatypes, there is a separate interactive dialog that shows the user the current values of various fields in the database, allows the user to changes those fields, and then saves the edits into the database. These are the tools our biologist curators use to update the database.

How are you tracking changes to the entries in the database? Is there a complete log of changes by whom (for attribution)? Both the changing of information and attribution are "for free" with a git based system.

We have a complete log of all update transactions, stored within the database. It can be queried via SQL. Yes, it includes a record of who made the update.
Just as to update a shared git repository, a user has to be
granted access to the repository. So one could just as well accuse
a git-based project of "hiding information in a private repository".
This is not true. Everybody can make a pull request, no login, user data, account or anything needed. You can promote users to trusted commiters, but everybody can propose changes in a simple manner.

I thought there were private projects in git? Maybe not. So OK, changes can be proposed in a simple manner (if you call it simple for someone to understand the XML well enough to make a change), but still to commit directly the person editing must be authorized.
The XML-based formats used by many model developers are not easy
to update and are not at all git-friendly.
I completely agree. They are good as exchange formats, but not suited to track changes.

And there is another series limitation of exchange formats, I think raised earlier in this thread, which is that if the file is written by a software tool (e.g., an SBML editing tool), there is no guarantee that the tool will write the file with objects (e.g., reactions) in the same order or with line breaks or other whitespace in the same place. These syntactic changes to the file could cause thousands of conflicts and render git-based change tracking useless. So all tools used for editing must be controlled carefully to be sure they do not misbehave in this sense.
I really hope the community is working into this direction instead
of hiding information in in-house databases.
Yes, this happens independent of the the used technology, but is dependent on the license of the data. There are open databases like Reactome (CCO) and there are private repositories and vica versa. Unfortunately, this is the impression I get when looking at the subscription model behind HumanCyc. Why should I curate an entry in HumanCyc if these is subsequently licensed for someone else and only accessible after others pay?

OK, now you are getting into non-technical issues, which is fine. I've tried for 20 years to convince government funding agencies to fund curation of the many organisms they have paid to sequence, but they refuse to do it, they only support curation of a small number of organisms. BioCyc is now exploring use of a different model, the subscription model, to raise funds for curation. We do not make a profit from BioCyc subscription revenues, they are all re-invested into BioCyc itself to create a stronger resource for the scientific community. Just as libraries buy subscriptions from journals and scientists write journal articles that are only available to subscribers, we are using a similar approach to fund BioCyc.

Best wishes,

Peter

Best Matthias

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopencobra%2Fmemote%2Fissues%2F258%23issuecomment-374700251&data=01%7C01%7Cpeter.karp%40sri.com%7C953cdd1a1c754efd065e08d58e8d7bbb%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=hbIf%2FiUlGy1%2FlqFHB5qeY4j3K4qNC3WBLEO9eT6qZKw%3D&reserved=0, or mute the thread https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABBc47RRhQHdKa7S5O6l8jLL9zHkehdXks5tgUVmgaJpZM4QIp2-&data=01%7C01%7Cpeter.karp%40sri.com%7C953cdd1a1c754efd065e08d58e8d7bbb%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=uAV748fMlh4Y1X7O5hqxyy8%2Fz0xxBURGNxvQcduEqcs%3D&reserved=0.

phantomas1234 commented 6 years ago

Hi @pkarp111, what would be the next steps for integrating memote into pathways tools then?

pkarp111 commented 6 years ago

Hi, I'm not sure I understand the question. First I will say that I have been surprised that the memote manuscript is concerned with collaborative development of metabolic models. Based on the title and abstract, memote is concerned with developing a standard set of metabolic model tests. In my view, the manuscript should focus on that well defined topic and not drift to the topic of collaborative development.

If you mean how should the memote tests be integrated with pathway tools, well, one approach would be run run the tests on the SBML that pathway tools can generate. Right now our SBML is not so high quality, but we hope to improve that in coming months.

Another approach would be to implement the memote tests directly against a pathway tools model/database using one of our APIs. That would take some programming effort.

Peter

On 3/21/18 2:18 AM, Nikolaus Sonnenschein wrote:

Hi @pkarp111 https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpkarp111&data=01%7C01%7Cpeter.karp%40sri.com%7C32a702d95fb84e6624da08d58f0cc007%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=fll6sV8L0DlOtxMVH7%2BIfq%2BsPEHg64Qz4LysuhUaZCg%3D&reserved=0, what would be the next steps for integrating memote into pathways tools then?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopencobra%2Fmemote%2Fissues%2F258%23issuecomment-374874678&data=01%7C01%7Cpeter.karp%40sri.com%7C32a702d95fb84e6624da08d58f0cc007%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=hEPGlwiedc1%2FZhujPvaPJksvzlThLd3Y%2FFafIfeFQV0%3D&reserved=0, or mute the thread https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABBc4z8K8tnKRw77BJBF1nrJ_3NYtw5aks5tghrqgaJpZM4QIp2-&data=01%7C01%7Cpeter.karp%40sri.com%7C32a702d95fb84e6624da08d58f0cc007%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=yV9GFPO2%2FNnDWmOGZ8qfomGj339VPOtIBKRPQPrLVM8%3D&reserved=0.

BenjaSanchez commented 6 years ago

Hi all, thought it would be interesting to mention here that we just made public in Github the yeast consensus GEM for S. cerevisiae. It started from a previous model (Yeast7) so it's by no means proof that reconstruction from scratch using git works, but at least it shows that it's quite possible to maintain the updates to a model through git and Github, and allow people to send PRs to improve it. See for instance PR https://github.com/SysBioChalmers/yeast-GEM/pull/80: The changes in the .xml might be messy and hard to read, but we have also a small .txt file that displays easily what changed and makes it easy for the admin to approve/reject changes. One can support the changes further with any .tsv file indicating the corresponding reasons/citations/etc, and a script that makes the changes from model A to model B. We just integrated also a .yml file, so changes to any field are now easily tracked, as @ChristianLieven mentions. Other advantages of keeping a GEM tracked in Github:

Issues can easily be tracked and avoid people repeating themselves.
The simplified Gitflow structure allows people to work in different parts of the model without creating conflicts (although sometimes they do happen and then the admin rejects the PR, see https://github.com/SysBioChalmers/yeast-GEM/pull/62).
devel branch keeps only flat files fully traceable, but releases also contain binaries such as .mat and .xlsx, for people that just want to use the model.
Projects show to the community on what are the developers working, in case someone wants to join.

Again as @phantomas1234 said, this is of course not the only way to go, but it cannot be denied that it has a bunch of advantages in terms of collaborative development that other systems don't fully have.

As a final remark, any yeast modeler out there that would like to open issues or send us PRs to improve the model, please do so!

pkarp111 commented 6 years ago

What tools were used to edit the file?

How did you solve the problem that the tools might emit reactions or other components of the file in different orders?

On 4/16/18 3:54 AM, Benjamín Sánchez wrote:

Hi all, thought it would be interesting to mention here that we just made public in Github the yeast consensus GEM https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSysBioChalmers%2Fyeast-GEM&data=01%7C01%7Cpkarp%40AI.SRI.COM%7Cbc14514f4ee74a57e79208d5a3887e2a%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=T2qLfIUNYLLNOS1IcYbrmkKc4MvjvxDefuaxV%2FUk%2Bvw%3D&reserved=0 for /S. cerevisiae/. It started from a previous model (Yeast7) so it's by no means proof that reconstruction from scratch using git works, but at least it shows that it's quite possible to maintain the updates to a model through git and Github, and allow people to send PRs to improve it. See for instance PR SysBioChalmers/yeast-GEM#80 https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSysBioChalmers%2Fyeast-GEM%2Fpull%2F80&data=01%7C01%7Cpkarp%40AI.SRI.COM%7Cbc14514f4ee74a57e79208d5a3887e2a%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=2UNj2Sce8Eyg0Ve6wss6gaxR372mQZKj%2FOqmmPNWBaQ%3D&reserved=0: The changes in the |.xml| might be messy and hard to read, but we have also a small |.txt| file that displays easily what changed and makes it easy for the admin to approve/reject changes. One can support the changes further with any |.tsv| file indicating the corresponding reasons/citations/etc, and a script that makes the changes from model A to model B. We just integrated also a |.yml| file, so changes to any field are now easily tracked, as @ChristianLieven https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FChristianLieven&data=01%7C01%7Cpkarp%40AI.SRI.COM%7Cbc14514f4ee74a57e79208d5a3887e2a%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=4A5Z%2BrmrFccKNrafjV%2BXxs1ozLB8LNPxMGfCO8TCYkc%3D&reserved=0 mentions. Other advantages of keeping a GEM tracked in Github:

Issues https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSysBioChalmers%2Fyeast-GEM%2Fissues&data=01%7C01%7Cpkarp%40AI.SRI.COM%7Cbc14514f4ee74a57e79208d5a3887e2a%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=Zw8EODoDqSli7czhjXpGmdfC1PL%2FRv4bjJpv2mYxTAg%3D&reserved=0 can easily be tracked and avoid people repeating themselves.

The simplified Gitflow structure https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSysBioChalmers%2Fyeast-GEM%2Fnetwork&data=01%7C01%7Cpkarp%40AI.SRI.COM%7Cbc14514f4ee74a57e79208d5a3887e2a%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=zZ5TxuOmv5Nn7oILzEVYtQwQ0MGODLoWqDgxNA3GQNk%3D&reserved=0 allows people to work in different parts of the model without creating conflicts (although sometimes they do happen and then the admin rejects the PR, see SysBioChalmers/yeast-GEM#62 https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSysBioChalmers%2Fyeast-GEM%2Fpull%2F62&data=01%7C01%7Cpkarp%40AI.SRI.COM%7Cbc14514f4ee74a57e79208d5a3887e2a%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=vTfoM0xKgo8uHfUAXJ9%2BXkvPaYW2wtIWnz1%2BDcDjX3M%3D&reserved=0).

|devel| https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSysBioChalmers%2Fyeast-GEM%2Ftree%2Fdevel&data=01%7C01%7Cpkarp%40AI.SRI.COM%7Cbc14514f4ee74a57e79208d5a3887e2a%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=hYXK01V%2BNOYRpSTyEZ8J4PQVh62QGtNE%2F%2F6bc5tk%2FaM%3D&reserved=0 branch keeps only flat files fully traceable, but releases https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSysBioChalmers%2Fyeast-GEM%2Freleases&data=01%7C01%7Cpkarp%40AI.SRI.COM%7Cbc14514f4ee74a57e79208d5a3887e2a%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=oRakqbEvG%2B30dqhHz6saJU%2F8A6Y5b9%2B3VJ5kLGEBPyI%3D&reserved=0 also contain binaries such as |.mat| and |.xlsx|, for people that just want to use the model.

Projects https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSysBioChalmers%2Fyeast-GEM%2Fprojects&data=01%7C01%7Cpkarp%40AI.SRI.COM%7Cbc14514f4ee74a57e79208d5a3887e2a%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=oHjTNLgbKsSpFLPKsXH47hxtTvjzWev3xdCcbDOxH48%3D&reserved=0 show to the community on what are the developers working, in case someone wants to join.

Again as @phantomas1234 https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fphantomas1234&data=01%7C01%7Cpkarp%40AI.SRI.COM%7Cbc14514f4ee74a57e79208d5a3887e2a%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=9nXUdaYE1JnZwIk2zgKkQflt9Lvg2YytDY%2B2yhWpnbU%3D&reserved=0 said, this is of course not the only way to go, but it cannot be denied that it has a bunch of advantages in terms of collaborative development that other systems don't fully have.

As a final remark, any yeast modeler out there that would like to open issues or send us PRs to improve the model, please do so!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopencobra%2Fmemote%2Fissues%2F258%23issuecomment-381560328&data=01%7C01%7Cpkarp%40AI.SRI.COM%7Cbc14514f4ee74a57e79208d5a3887e2a%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=MUBtgrzIIH%2BEYgSeXrzKuoEz0yPKYPwxzZknp8Cp8TI%3D&reserved=0, or mute the thread https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABBc491dcSzrZeW0_AwtPJOjAk8uXdaXks5tpHh6gaJpZM4QIp2-&data=01%7C01%7Cpkarp%40AI.SRI.COM%7Cbc14514f4ee74a57e79208d5a3887e2a%7C40779d3379c44626b8bf140c4d5e9075%7C1&sdata=QvXYaEK0BxtstzCWGQX8Bfx1MtmTgT%2BgnCycIFhiZWM%3D&reserved=0.

draeger commented 6 years ago

@BenjaSanchez this looks very good!

Maybe SBTab could be a good solution for a git-based community reconstruction project. It is compatible with the standard format SBML but can be stored in simple CSV files (or TSV) that people can edit in their table calculation software of choice. For Excel, there is even an addon that imports and exports SBML directly. This format is row-based and can, therefore, work with line-based version control systems, such as git. And, it is much easier to edit than the XML-based formats.

BenjaSanchez commented 6 years ago

@pkarp111 Every change is done through Matlab (but we will soon work with integration with cobrapy as well). The model is saved in .xml and .txt by COBRA's writeCbModel.m. The .yml file is sorted everytime and saved by a RAVEN function, so there will never be any scrambling in that file. That being said, so far we've been using the model for a while and the .xml has not been scrambled. The format does change now and then (due to COBRA mostly), but then we can do a sole commit chore: update COBRA in which it would show that even if the .xml file changes, .txt and .yml remain the same. So I would say scrambling is not an issue anymore.

@draeger SBTab sounds good and we will consider .tsv as an additional format to engage more users, thank you!

bdelepine commented 6 years ago

@BenjaSanchez Hi! Thank you for sharing your experience. I was wondering: What is the advantage of sorting the .yml instead of the .xml (inside each listOfX)? And is there any particular reason for not using COBRA's method to export to .yml?

BenjaSanchez commented 6 years ago

@bdelepine The main advantage is readability: a single change of a value translates into 1 line changing in the .yml but several in .xml; see as example this commit that changes gene rules in the .xml Vs this commit that changes gene rules but in the .yml.

I thought COBRA toolbox did not offer exporting to .yml, but maybe this has changed?

bdelepine commented 6 years ago

@BenjaSanchez The comparison of the commit is indeed much more readable.

I don't know about MATLAB, but COBRApy that is able to export in .yml.

BenjaSanchez commented 6 years ago

@bdelepine indeed, we followed the standard of cobrapy for creating the .yml file. One of our hopes is that both the Matlab and Python communities could work on the same model without losing compatibility, and we will soon work to make this possible :)