Define goals [meta] #2

andru opened 8 years ago

andru commented 8 years ago

Seems to me we might discuss a little what our aims are for the project in order to inform the technical discussion and plans for implementation...

It would be good to try and get some interested parties together to go over some of this, to make sure we're building something which serves the broadest base possible within the confines of something achievable.

simonv3 commented 8 years ago

Thanks for taking this initiative @andru!

cc @elf-pavlik?

Here's my attempt at some answers.

elf-pavlik commented 8 years ago

Worth keeping in mind:

tkeifer commented 8 years ago

When thinking about goals, I usually try to answer these questions...

Applying this methodology I think @simonv3 is onto a good start. We think there is a need for a more machine-friendly database of (at least) plant data, but we should take it a step further to gauge how much effort to put into it. (and getting answers might involve reaching back out to the community)

This is a bit "stream of consciousness" but I think important things to ask before diving in too deep. The answer may end up being "there is no good reason, it'd just be fun" - which is fine too. :)

elf-pavlik commented 8 years ago
  • What makes my attempt better than others?

Also how it fits together with other attempts already present in current ecosystem. IMO README should list all the other know and relevant efforts. Besides mentioned before AGROVOC I also recall Practical Plants

tkeifer commented 8 years ago

Great point. That site is a great reference, I had not come across it before. I've been studying it for a couple hours and there are a lot of interesting things we can learn from how they've approached it.

The thing that amazes me is the sheer depth and breadth of characteristics there are for even a narrow sampling of plants. Organizing all of that is going to be a real challenge.

I like the focus on "useful plants" - more broad than vegetables, but less difficult to approach than the entirety of the natural kingdom.

andru commented 8 years ago

@elf-pavlik somehow I've not come across AGROVOC before. Thanks for sharing. Agreed on identifying similar projects early. Edit: see the wiki

@tkeifer I like your approach to identifying a problem and addressing needs. I'll try to be guided by it as I get some thoughts out.

Thanks for mentioning Practical Plants; it's a wiki I developed some time back and it's in large part my experience in pulling that project together which motivates a lot of my desire to see better open data available, particularly in the areas of names and physical and environmental data, and to see better tools for collectively stewarding that data.

It's also my work on a wip agro-tech app Hortomatic and my involvement in the ongoing semantic preparation of the Flora of North America which informs some of the direction I'd like to see on the technical side, but that's for a later conversation.

Here's my thoughts on some problems that need to be solved and some goals and ethos for the project... stream of conscious style while getting some of these thoughts out...

Problems to solve / niches to serve

Some ideas for Goals

Related thoughts

roryaronson commented 8 years ago

Great to see this conversation happening all :)

With FarmBot, at the very least, we'd like to access a database of common edible crop names and representative icons. While we could create and host that alone, we see a lot of value in pooling the effort to allow many apps to use/create/maintain the data. We're currently planning to use the OpenFarm API for this need, because eventually we want to use OpenFarm guide data as well.

Regarding centralized vs distributed, I think a hybrid approach would be cool. Distributed in the sense that different apps will exist in order to create special data around the crops/do things with the data for that specific application. And centralized in the sense that we maintain a 'canonical' data set that all apps are using/contributing to. As far as I understand, that's why we want to separate the crops db from OpenFarm guides in the first place. Let OpenFarm do guides, Hortomatic do garden planning/tracking, FarmBot do its thing, etc; and the crops db be the shared resource among them all.

roryaronson commented 8 years ago

As far as the types of data this crop db holds, I think it would be neat to not set out any limitations, but rather allow it to grow in any direction based on the overlapping needs of the beneficiaries. So for example, we might start out with just common names. Then we find out that two beneficiaries (FarmBot and Hortomatic) want icons. So those two communities can spearhead that component of the data set. Then we find out that group X and OpenFarm want to share photos. So those two can spearhead that.

Just spit ballin here :)

mstenta commented 8 years ago

Hey everyone! Thanks for getting this conversation started!

I am building a farm management platform called farmOS (, and I have been putting some thought into designing a standardized data type for storing cultivation data for various crops/varieties/species. It would be great if we could all work together on a common data format that is useful to everyone!

I agree with a lot of the things said above. Here are some thoughts I would add (or echo):

So really, I think what I am leaning towards is more of a "standard data format" than a specific collection. Perhaps the definition of that data format could be it's own Git repository, and OpenFarm's collection of crops could be a separate one, which the OpenFarm web app refers to for information. (I am not very familiar with the OpenFarm architecture, so I'm not sure if that makes sense or not.)

The main challenge in collaborating on something like this is defining what actual data we all need represented, and where that is the same and where that is different. I'll organize my notes and post some of those details soon.

Excited to continue the conversation!

andru commented 8 years ago

Species in general - Plants are important, and maybe it makes sense to focus on them first, but I will also have a need for other taxonomies like animal species, fungal species, etc ... Distributed - Ultimately I don't see this being stored in one single collection or repository. There are just too many possible varieties out there. I see many many many different repositories, all with their own purposes

I agree with your ideas on plurality. There is rarely a one size fits all approach to data. To me it seems to me that a good dividing line for purpose here is horticulture; keeping the scope and by extension the schema restricted to something manageable.

If the project works we could clone the model for similar databases covering related agricultural domains.

And if I decided to breed a new variety of tomato, I could make my own! Thus we could also create libraries of species, where one could pick-and-choose the ones that they care about.

I think there's absolutely scope for someone's new variety of tomato to be in this crop database, but I agree with your general point on the power of a distributed dataset. To function well I think distribution either needs a central authority (e.g. a git origin), as @roryaronson mentioned, or we would need to come up with some good standards to keep the data compatible, because merging data is messy.

@roryaronson I think you talk a lot of sense when it comes to what the database holds. That we grow it organically based on our needs. There's also some overlap with your thoughts here Mike... if I need GDD data for Hortomatic and nobody else does, then I could start a database with just GDD data, but using some shared standards on naming, schema, etc, to make that database compatible.

andru commented 8 years ago

Actually, rolling on from that last comment. Maybe we should be making multiple, distinct databases which share a common naming scheme, each covering a small purpose.

These would be a product of our collective needs but for example...

A shared naming scheme is the tricky part. There are often multiple ways to refer to a crop depending on the nomenclature used, so we'd have to come up with some strict rules on which nomenclature gets used where. In a single database this is usually solved by just using an arbitrary id, but I think the id should be human readable in this case.

pmackay commented 8 years ago

A few questions/comments:

There is some great goals content in this issue, would it be worth extracting out into a wiki page, separate from the conversation? It also helps to separate out possible tech solutions and preferences from user needs IMHO. Could a set of user stories be developed? e.g.

As a food/farming website developer, I need access to a simple API for food plants that gives key information (fill in specifics) and links to other relevant resources, so that there is less duplication of effort in creating crops datasets.

There's also (at least) 2 key groups of needs:

What's missing from these sources that doesnt meet your needs?

The main challenge in collaborating on something like this is defining what actual data we all need represented, and where that is the same and where that is different.

So what about starting to develop a linked data model? Or simply a set of models and their properties, which could be translated into linked data formats later.

Quick background: I'm quite interested in this area, have worked on OpenFoodNetwork a fair bit, a little on an API for Growstuff and explored food data modelling on Freebase before it was eaten by Google.

tkeifer commented 8 years ago

I was thinking about this conversation over the weekend, trying to think about it at a high-level and had some insights... (bear with my explanation)

It seems to me we are struggling with a question of how to efficiently represent what is essentially plant genetics. For any given living thing really, subtle changes in genetic makeup result in traits which we (as humans) then classify into manageable groups. In our scenario - fruits and vegetables, and all their divisions. The result is a near infinite combination of characteristics that we could potentially need to represent in a database - as gene identification technology advances we could find out our fairly myopic view of the diversity in our vegetables is actually tremendously more than we imagined. While "Tomato A" looks exactly like "Tomato B", it may have a single gene difference that makes it more cold-tolerant and, as such, would be referred to by a completely different name.

Obviously, we can't gene sequence every single vegetable and store that data for the average backyard-gardener to search (though that would be cool), so we need to abstract it out a little bit. So looking at it from the opposite perspective, I said "how do two people currently differentiate between different crops?" I realized that we do this by evaluating a very small amount of traits, most of which are visual. I'll use peppers as an example - If we look at one that is round and orange, we all agree - "that is a habenero." In lieu of hard, scientific fact - we generally go with a loose naming convention by majority rule.

So what is my point? I envision some sort of object-storage mechanism, which allows attributes to be applied and then grouped through a crowd-sourced type of mechanism. "Object A" is placed into the database and very small core set of attributes are fixed - height, sun requirements, spacing recommendations, etc... are applied. The rest - specifically names, varieties, etc... are left to a kind of crowd-sourced tagging mechanism. If 50 ppl look at a picture of our object and say "Thats a tomato", we go with tomato. If a tomato expert logs in and says "that's a cold-weather, cherry tomato" we apply the tags. There could be some sort of weighting applied to bubble good tags up.

I dont know if such an object-storage type of mechanism exists, but I thought I'd throw this out there and see what you guys thought.

pmackay commented 8 years ago

If the models, properties, data, etc are to be useful to a wide range of groups, I wouldnt use tagging. Would be much more beneficial to define a strongly typed set of models and properties. However a system that allows people to enter the information like a wiki based on those models could be good.

tkeifer commented 8 years ago

Could you explain "define a strongly typed set of models and properties" in more detail? I'm not sure I follow.

pmackay commented 8 years ago

Basically what's now being debated in #5. So define a model, e.g. Crop, and the properties it can have, e.g. the list started here

mstenta commented 8 years ago

@tkeifer Take a look at #5 - I'm basically suggesting we provide a very small core set of data and let other third-party datasets extend it.

mstenta commented 8 years ago

... that would allow for the basic set of attributes to be defined, and then other people to define "varieties" or "cultivars" that extend it. I think that jives with what you're saying, yea?

tkeifer commented 8 years ago

I missed that... it looks close though! My experience has been that even the simplest of assumptions around types, variety, etc.. tends to fail hardcore in the plant world, so I was trying to think of a method of reference that was super flexible and didn't involve many rules. I'm interested to see how that idea evolves.

andru commented 8 years ago

I think taxonomy is a useful compromise. We accept in naming a cultivar that the genetics are variable and that a term like Brassica oleracea 'Early Purple Sprouting' represents an arbitrary genetic community which we choose to label for our own needs.

Taxonomy accepts this because the alternative, attempting to model the huge complexity of genetics, is not only practically impossible, but I don't see why it would be desirable.

When a group of plants has genetically diverged from another enough to have different qualities, and defining that group of genetics as distinct is useful to humans, we assign a new name in order that we can communicate about it. To me the lack of 1-to-1 mapping with genetics is not a flaw we need to figure out, it's a fundamentally useful abstraction we can't do without.

"how do two people currently differentiate between different crops?" I realized that we do this by evaluating a very small amount of traits, most of which are visual.

I'd say visual traits are no more important to food crops than any of the other traits. There is also taste, aroma, texture, life stage timing, shelf life, environmental tolerances and preferences, etc. These things cannot be detected and recorded without rigorous study, and I wouldn't trust a digital crowd sourced methodology to get it right.

The rest - specifically names, varieties, etc... are left to a kind of crowd-sourced tagging mechanism. If 50 ppl look at a picture of our object and say "Thats a tomato", we go with tomato. If a tomato expert logs in and says "that's a cold-weather, cherry tomato" we apply the tags.

I find no fault in your method, only that this is more or less how taxonomy has worked for generations and the result is the taxonomy and nomenclature we currently use.

I might be misunderstanding something in your proposal, but I don't see an improvement over current taxonomy. Could you expand on what problem it solves?

tkeifer commented 8 years ago

@andru - I hadn't yet seen the taxonomy that was started, so it was not meant to be a discussion of how another approach would be better really.

To clarify though... I was not suggesting we represent the plants genetically in the repository, only pointing out that in the absence of hard scientific differentiators (looking at pictures on the internet or browsing a farmers market, for example) people revert to visual indicators.

To use your example - the average person talking to a farmer is probably more unlikely to look at something and say "that is Brassica oleracea" than they are "that is Early Purple Sprouting" - so it might make sense to approach building a database from a less classification-intensive way than the traditional Family-Genus-Species model. This also takes into account the fact that a large majority of users may be operating below these levels anyway in their discussion of cultivars and varieties over genus and species.

Hope that helps clarify somewhat...

andru commented 8 years ago

the average person talking to a farmer is probably more unlikely to look at something and say "that is Brassica oleracea" than they are "that is Early Purple Sprouting"

Thanks for clarifying. Totally agreed that scientific nomenclature is unfamiliar to most people. I think we need to use the taxonomic model for providing structural relationships to the data and have a very extensive list of common names in all languages so that people can access the data in whatever way is familiar to them

roryaronson commented 8 years ago

@andru I agree that while the scientific nomenclature will likely not be used by most apps or people, it seems the best thing we have for structuring the data. Each app can then choose to only represent the common names if desired.

Mageistral commented 8 years ago

I had some thoughts after the talking around climates and website like this one or Wikipedia on "TOWN#climate" provides really good info. Another thing is that the seed providers give the date infos in a country/state context. I think it is not that hard to link climates to "gardeners profile" and derivate dates from the generic crop to the gardener's context.

I know this is not the priority but I wanted to write it down somewhere.