NTR: machine learning technique

TrisSN commented 6 years ago

Hi. I'm a relatively new Data Curator at Scientific Data. Nice to meet you all!

NTR: machine learning technique Definition: Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed. [Source: https://en.wikipedia.org/wiki/Machine_learning] Subclass_of: data transformation Synonyms: statistical technique

With two children terms: NTR: supervised learning Definition: Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. [https://en.wikipedia.org/wiki/Supervised_learning] Subclass_of: machine learning technique

NTR: unsupervised learning Definition: Unsupervised machine learning is the machine learning task of inferring a function that describes the structure of "unlabeled" data (i.e. data that has not been classified or categorized). [https://en.wikipedia.org/wiki/Unsupervised_learning] Subclass_of: machine learning technique

turbomam commented 6 years ago

Great request. We're discussing this as part of the August 6th 2018 OBI conference call. We would love to create these terms but need some more input

can you provide an example of usage? once we create these terms, how would you use them?
synonyms are valuable but 'statistical technique' seems overly general... a test that outputs a critical value (like student's t) doesn't seem like an example of machine learning
we usually speak of processes, not techniques
parent class: plan specificatio instead of data transformation... is the idea here to capture the parameters and algorithm that could define a machine learning process, or the execution of that process with some input and some output?
OBI team needs to provide better docs or diagrams that show the differences between a design and a planned process

TrisSN commented 6 years ago

1) Usage example: In creating one of our metadata summaries for a linguistics-based dataset, I need a single term to describe the technology that was used to generate two sets of derived data. One used a clustering (unsupervised) technique, and the other a Support Vector Machine (supervised). In fact, the 'supervised learning' term could be used for many popular techniques, such as Artificial/Deep/Convolutional Neural Networks or Reinforcement Learning, while 'unsupervised' for k-means, Hidden Markov Models, self-organising maps. However, with many tweaked/unusual machine learning techniques, I think the umbrella term may be most useful. 2) Regarding 'statistical technique': agreed! There are also terms like 'pattern recognition', 'classification', but I don't think any of them are great synonyms. Perhaps better to leave it un-synonym-ised. 3) 'machine learning process', or simply 'machine learning' would be fine. 4) My goal was to capture the execution of the process with some input and output. Rather than the focus really being on the machine learning process and all its details, the focus is on the biological/linguistic data provided to and output from the technique. I'm unsure about 'parent: class: plan specification'. Would this still include unsupervised techniques, where the algorithm is specified, but the output is not? (i.e., the goal is for the algorithm to suggest a good organisation of the input data).

turbomam commented 6 years ago

Great, thanks. We'll probably discuss this next during our conference call on August 13th. See http://obi-ontology.org/#contact-us

turbomam commented 6 years ago

I'm entering this now.

I just started wondering whether all machine learning processes are data transformations? Or is machine learning really ever a data transformation? When I train a SVM with data an labels, I create a function, not a new data set. Of course, when I apply the SVM over unlabeled data, I do get transformed output.

For now I'm going to make machine learning a subclass of planned process.

Next steps could include one of these decisions or something else

break out the training and application phases and day that the application is a data transformation? (Would that only apply to supervised approaches?)
convince ourselves that all machine learning processes are data transformations

TrisSN commented 6 years ago

Hi Mark,

Great – feel like I’m achieving things in the new job!

Re. machine learning being data transformations: I’ve mused on this before too. I’m now of the opinion they are. Training an SVM, you’re transforming labelled data into a set of weights (that act as coefficients in the function you mentioned). That said, I’m not sure it is the best idea to call them all data transformations as ‘data transformation’ as there seems to be a big divide between computational/mathsy people and everyone else in what this means to them.

Regards, Tris

From: Mark A. Miller [mailto:notifications@github.com] Sent: 22 August 2018 14:46 To: obi-ontology/obi Cc: Tristan Matthews; Author Subject: Re: [obi-ontology/obi] NTR: machine learning technique (#948)

I'm entering this now.

I just started wondering whether all machine learning processes are data transformations? Or is machine learning really ever a data transformation? When I train a SVM with data an labels, I create a function, not a new data set. Of course, when I apply the SVM over unlabeled data, I do get transformed output.

For now I'm going to make machine learning a subclass of planned process.

Next steps could include one of these decisions or something else

break out the training and application phases and day that the application is a data transformation? (Would that only apply to supervised approaches?)
convince ourselves that all machine learning processes are data transformations

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_obi-2Dontology_obi_issues_948-23issuecomment-2D415037086&d=DwMFaQ&c=vh6FgFnduejNhPPD0fl_yRaSfZy8CWbWnIf4XJhSqx8&r=5xdfX-Y1TMpNIij5FHjUCIIO6YE1dHWiFaEQWLn3KsTj7mNVR5XqcUOlJCu8e2IH&m=5JrbxOdapSIr10yxD8XROZmdKv1Ohg2o-CmszuSsgCY&s=AqbT35NI9lKxLA_SoXeksRY7nwfK7VSBuuGOiiplVlQ&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AoMgqZybsyqAkzQiV98KRP9gZ1GB1vR7ks5uTWCTgaJpZM4Vwm45&d=DwMFaQ&c=vh6FgFnduejNhPPD0fl_yRaSfZy8CWbWnIf4XJhSqx8&r=5xdfX-Y1TMpNIij5FHjUCIIO6YE1dHWiFaEQWLn3KsTj7mNVR5XqcUOlJCu8e2IH&m=5JrbxOdapSIr10yxD8XROZmdKv1Ohg2o-CmszuSsgCY&s=GDS4nwS5Zvtf3l4SQVPPgtOLymWaCy0KQ063VbqTcB4&e=.

DISCLAIMER: This e-mail is confidential and should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage mechanism. Springer Nature Limited does not accept liability for any statements made which are clearly the sender's own and not expressly made on behalf of Springer Nature Ltd or one of their agents. Please note that Springer Nature Limited and their agents and affiliates do not accept any responsibility for viruses or malware that may be contained in this e-mail or its attachments and it is your responsibility to scan the e-mail and attachments (if any).

turbomam commented 6 years ago

Here's what I have entered into the active development "obi-edit.owl" file. It may be a few days before it gets pushed to production/release. I guess the term IDs could possible change too.

Let me know how you'd like to be listed as a contributor (I have used your GitHub handle for now)

I think @cstoeckert and I will work on 'transcriptome assembly' #950 later this week.

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/OBI_1110208">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/OBI_0000011"/>
        <obo:IAO_0000111>machine learning</obo:IAO_0000111>
        <obo:IAO_0000115>A planned process in which...

statistical techniques are used to give computers the ability to learn about patterns in data, without being explicitly programmed.  Learning is defined in this case as progressively improving performance on a specific task.</obo:IAO_0000115>
        <obo:IAO_0000117>Mark A. Miller</obo:IAO_0000117>
        <obo:IAO_0000119>paraphrased from https://en.wikipedia.org/wiki/Machine_learning</obo:IAO_0000119>
        <dc:contributor>https://github.com/TrisSN</dc:contributor>
        <rdfs:label>machine learning</rdfs:label>
    </owl:Class>

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/OBI_1110209">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/OBI_1110208"/>
        <obo:IAO_0000111>supervised machine learning</obo:IAO_0000111>
        <obo:IAO_0000115>A machine learning process using a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples.</obo:IAO_0000115>
        <obo:IAO_0000117>Mark A. Miller</obo:IAO_0000117>
        <obo:IAO_0000119>paraphrased from https://en.wikipedia.org/wiki/Supervised_learning</obo:IAO_0000119>
        <dc:contributor>https://github.com/TrisSN</dc:contributor>
        <rdfs:label>supervised machine learning</rdfs:label>
    </owl:Class>

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/OBI_1110210">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/OBI_1110208"/>
        <obo:IAO_0000111>unsupervised machine learning</obo:IAO_0000111>
        <obo:IAO_0000115>A machine learning process that infers  a function describing the structure of &quot;unlabeled&quot; data (i.e. data that has not been classified or categorized).</obo:IAO_0000115>
        <obo:IAO_0000117>Mark A. Miller</obo:IAO_0000117>
        <obo:IAO_0000119>paraphrased from https://en.wikipedia.org/wiki/Unsupervised_learning</obo:IAO_0000119>
        <dc:contributor>https://github.com/TrisSN</dc:contributor>
        <rdfs:label>unsupervised machine learning</rdfs:label>
    </owl:Class>

jamesaoverton commented 6 years ago

Closed by #948

obi-ontology / obi

NTR: machine learning technique #948