petermr / CEVOpen

Contentmining of Open phytochemical literature for medicinal activities
26 stars 19 forks source link

Explore KNIME functionality on articles and CProjects #38

Open petermr opened 4 years ago

petermr commented 4 years ago

@deadlyvices has been exploring this and reporting in email.

ACTION: copy any relevant past emails here...

petermr commented 4 years ago

I've made some headway in processing ContentMine output using KNIME. I can now read in the full text of articles and tag it up using OSCAR.

image We should also be able to tag using the dictionaries that Ambarish are creating.

deadlyvices commented 4 years ago

I'm picking up the conversation from our initial email chart. I'm currently using KNIME to see if we can use the output of getpapers and ami as feedstock for some further analysis.

There are two main areas I'm investigating:

I should hopefully have something to show over the next few days or so.

deadlyvices commented 4 years ago

So: I suppose the next question is - if we're looking for telling correlations between conditions and substances, should we be looking in the abstract or the body? I'd say the former as it's most likely to spell out key conclusions.

deadlyvices commented 4 years ago

So: I've now got KNIME reading in the dictionaries and tagging up documents with them. This is good, but would be even better if there was an easy way of defining one's own tag set. I have do make to with the standard set of tags. The only way of doing this is in Java, and I know absolutely nothing about Java programming. So if anyone wants to take this on, please, be my guest.

petermr commented 4 years ago

On Mon, Oct 14, 2019 at 4:06 PM Clyde Davies notifications@github.com wrote:

So: I've now got KNIME reading in the dictionaries and tagging up documents with them. This is good, but would be even better if there was an easy way of defining one's own tag set. I have to make to with the standard set of tags.

What are these tags?

The only way of doing this is in Java, https://www.knime.com/for-developers-integration-of-custom-tag-sets and I know absolutely nothing about Java programming.

I can understand what is in the tutorial. It also uses Eclipse which is a standard Java IDE and I'm familiar with it.

So if anyone wants to take this on, please, be my guest.

If you can spell out what is required we can estimate the effort. (Most things at this stage are tweaking examples, not writing code from scratch).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS62SHPZQRZJFT4FNDLQOSDGBA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBFDMTI#issuecomment-541734477, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZRRWNUYHNKK75S6CTQOSDGBANCNFSM4JAOYUAQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

deadlyvices commented 4 years ago

KNIME passes data tables from node to node. These support a Document column type. Tagging nodes insert, well, tags into the document to mark recognised terms. There are your typical part-of-speech POS tag categories but also some more specialised ones. The blue text [OSCAR(ONT)] shown in the screenshot means that the OSCAR category has recognised an ontology entity and tagged it up appropriately. I think if we are going to take this further then we probably need an AMI category with PLANT, ACTIVITY, INSTRUMENT, PLANTPART etc. tags for each dictionary and the entity classes they recognise. Currently I'm having to use OSCAR tag types with a tag value of CUST to mark these up. This is nowhere near granular enough for our purposes. And it's wrong.

deadlyvices commented 4 years ago

This is what I've got so far. You can see the three taggers at the end of the workflow: image

I leave the dictionary tagging until last

deadlyvices commented 4 years ago

You can see what happens: it's recognised all the terms in the dictionaries but has been unable to differentiate between them: image

deadlyvices commented 4 years ago

I think the most immediate advantage of this approach is that it allows us to visually test the dictionaries. The word 'and' is tagged, but why this should is unknown.

deadlyvices commented 4 years ago

I just discovered a delightful feature of KNIME hub which makes it incredibly easy to overwrite a workflow with an old version! So I am going to have to recreate that one. But I have the screenshot so that at least will save me time figuring it out all over again.

petermr commented 4 years ago

What dictionaries are running and how does each work? I am a bit mystified by the words which are tagged. I now understand that each dictionary provides a single class of tag . I think your approach is reasonable - but need to know how it decides to tag.

On Mon, Oct 14, 2019 at 8:01 PM Clyde Davies notifications@github.com wrote:

I think the most immediate advantage of this approach is that it allows us to visually test the dictionaries. The word 'and' is tagged, but why this should is unknown.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS7OCTEIOKKQ7QFJIGDQOS6YJA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBGCB6Q#issuecomment-541860090, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS4P4NOCH775LOCVR4TQOS6YJANCNFSM4JAOYUAQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

deadlyvices commented 4 years ago

I'm just lumping all the dictionaries together right now. I think I can get away with just loading each dictionary and assigning a custom tag value individually to the matched terms. That would disambiguate effectively as there aren't so many of them.

On Mon, 14 Oct 2019, 22:20 petermr, notifications@github.com wrote:

What dictionaries are running and how does each work? I am a bit mystified by the words which are tagged. I now understand that each dictionary provides a single class of tag . I think your approach is reasonable - but need to know how it decides to tag.

On Mon, Oct 14, 2019 at 8:01 PM Clyde Davies notifications@github.com wrote:

I think the most immediate advantage of this approach is that it allows us to visually test the dictionaries. The word 'and' is tagged, but why this should is unknown.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS7OCTEIOKKQ7QFJIGDQOS6YJA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBGCB6Q#issuecomment-541860090 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAFTCS4P4NOCH775LOCVR4TQOS6YJANCNFSM4JAOYUAQ

.

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=ACM3QMSZCZVQCSQ2LKJERGLQOTPBXA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBGSY3I#issuecomment-541928557, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACM3QMWJY6RQLENBITIXVRTQOTPBXANCNFSM4JAOYUAQ .

petermr commented 4 years ago

Thanks Can you post a typical output?

On Mon, 14 Oct 2019, 22:58 Clyde Davies, notifications@github.com wrote:

I'm just lumping all the dictionaries together right now. I think I can get away with just loading each dictionary and assigning a custom tag value individually to the matched terms. That would disambiguate effectively as there aren't so many of them.

On Mon, 14 Oct 2019, 22:20 petermr, notifications@github.com wrote:

What dictionaries are running and how does each work? I am a bit mystified by the words which are tagged. I now understand that each dictionary provides a single class of tag . I think your approach is reasonable - but need to know how it decides to tag.

On Mon, Oct 14, 2019 at 8:01 PM Clyde Davies notifications@github.com wrote:

I think the most immediate advantage of this approach is that it allows us to visually test the dictionaries. The word 'and' is tagged, but why this should is unknown.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS7OCTEIOKKQ7QFJIGDQOS6YJA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBGCB6Q#issuecomment-541860090

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AAFTCS4P4NOCH775LOCVR4TQOS6YJANCNFSM4JAOYUAQ

.

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=ACM3QMSZCZVQCSQ2LKJERGLQOTPBXA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBGSY3I#issuecomment-541928557 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACM3QMWJY6RQLENBITIXVRTQOTPBXANCNFSM4JAOYUAQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS73FOLQLKIXKP62CMLQOTTQRA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBGXD2Q#issuecomment-541946346, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS6RKX2SGHAHC3ABUEDQOTTQRANCNFSM4JAOYUAQ .

deadlyvices commented 4 years ago

I'll have to rework that workflow anyway, so when I do I will get it to generate the output. Might be a little while doing that, though.

On Tue, Oct 15, 2019 at 9:13 AM petermr notifications@github.com wrote:

Thanks Can you post a typical output?

On Mon, 14 Oct 2019, 22:58 Clyde Davies, notifications@github.com wrote:

I'm just lumping all the dictionaries together right now. I think I can get away with just loading each dictionary and assigning a custom tag value individually to the matched terms. That would disambiguate effectively as there aren't so many of them.

On Mon, 14 Oct 2019, 22:20 petermr, notifications@github.com wrote:

What dictionaries are running and how does each work? I am a bit mystified by the words which are tagged. I now understand that each dictionary provides a single class of tag . I think your approach is reasonable - but need to know how it decides to tag.

On Mon, Oct 14, 2019 at 8:01 PM Clyde Davies <notifications@github.com

wrote:

I think the most immediate advantage of this approach is that it allows us to visually test the dictionaries. The word 'and' is tagged, but why this should is unknown.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS7OCTEIOKKQ7QFJIGDQOS6YJA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBGCB6Q#issuecomment-541860090

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AAFTCS4P4NOCH775LOCVR4TQOS6YJANCNFSM4JAOYUAQ

.

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=ACM3QMSZCZVQCSQ2LKJERGLQOTPBXA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBGSY3I#issuecomment-541928557

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ACM3QMWJY6RQLENBITIXVRTQOTPBXANCNFSM4JAOYUAQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS73FOLQLKIXKP62CMLQOTTQRA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBGXD2Q#issuecomment-541946346 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAFTCS6RKX2SGHAHC3ABUEDQOTTQRANCNFSM4JAOYUAQ

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=ACM3QMTS2QN3ZSMYB6HAUUDQOV3RPA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBH3IKY#issuecomment-542094379, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACM3QMSGBCNXZ2C4YXFEW43QOV3RPANCNFSM4JAOYUAQ .

-- Clyde

deadlyvices commented 4 years ago

OK, I've been thinking about the best way to share this. And it's the most obvious way: get the workflows into GitHub. I suggest we create a Knime folder (not where where exactly) in the repo and put the Knime workflows as immediate children. A workflow is simply a folder hierarchy, so it should fit in nicely.
This will also allow us to use relative paths when referencing our existing data files. Which should mean no pesky config changes for new users.

petermr commented 4 years ago

Great, as always happy to talk.

On Tue, Oct 15, 2019 at 11:53 AM Clyde Davies notifications@github.com wrote:

OK, I've been thinking about the best way to share this. And it's the most obvious way: get the workflows into GitHub. I suggest we create a Knime folder (not where where exactly) in the repo and put the Knime workflows as immediate children. A workflow is simply a folder hierarchy, so it should fit in nicely. This will also allow us to use relative paths when referencing our existing data files. Which should mean no pesky config changes for new users.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCSZ64RBLSPWZ2VPOFXLQOWOKNA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBIKEKY#issuecomment-542155307, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS27RH4CFA43GEHV7IDQOWOKNANCNFSM4JAOYUAQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

deadlyvices commented 4 years ago

How about if I create a top-level folder workflows and then one immediately under it knime? Then I put my workflows into that? That way if we adopt any other tools, we can put them into their own tool specific folders.

deadlyvices commented 4 years ago

Quick question: I have a good working knowledge of git but am no expert. Do .gitignore files only work at the top level of the repo, or can they be declared lower down so they're folder-specific?

petermr commented 4 years ago

do whatever makes sense. Github is free - we can create a fresh repo if it doesn't work...

On Tue, Oct 15, 2019 at 2:40 PM Clyde Davies notifications@github.com wrote:

How about if I create a top-level folder *workflows* and then one immediately under it knime*? Then I put my workflows into that? That way if we adopt any other tools, we can put them into their own tool specific folders.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS735DNIFFF5VAQW5BTQOXB6JA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBIZJII#issuecomment-542217377, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSYHLZWDNYC5K37HIVTQOXB6JANCNFSM4JAOYUAQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

deadlyvices commented 4 years ago

Do you update the master branch directly or is it all through pull requests? I've just created a branch that I'm happy with and could do with merging into master.

petermr commented 4 years ago

On Tue, Oct 15, 2019 at 3:47 PM Clyde Davies notifications@github.com wrote:

Do you update the master branch directly or is it all through pull requests? I've just created a branch that I'm happy with and could do with merging into master.

At present we generally all push to master directly. That's because maintaining consistent policy on branches is not easy when people aren't familiar with GIthub. I know it's crude but so far no problems

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS5E3GTCPSMUXL5EFULQOXJZ5A5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJBL2A#issuecomment-542250472, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZ3K72MRPSEJ4XUTQ3QOXJZ5ANCNFSM4JAOYUAQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

deadlyvices commented 4 years ago

OK. We're used on Chem4Word to processing changes through pull requests. I'll still create task branches, just to keep things isolated, but I'll merge in directly

petermr commented 4 years ago

Sure, It's more critical for code, especially where it overlaps. Here you are creating you own contribution and - so far - there probably won't be potential fo conflicts. It might happen if several people want to author a dictionary.

On Tue, Oct 15, 2019 at 3:58 PM Clyde Davies notifications@github.com wrote:

OK. We're used on Chem4Word to processing changes through pull requests. I'll still create task branches, just to keep things isolated, but I'll merge in directly

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCSYIITUOXMBKIFOQWEDQOXLCDA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJCV7A#issuecomment-542255868, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS62AKIOLEPAH7Y7B23QOXLCDANCNFSM4JAOYUAQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

deadlyvices commented 4 years ago

I'll work as I suggested then, until we end up with more people working on the workflows.

petermr commented 4 years ago

Huge Thanks for all the work. I am installing MACOSX KNIME. Then we can work together. I'd be surprised if we couldn't make rapid progress. We probably need to talk.

UPDATE have installed it. Point me at a CEV workflow!

deadlyvices commented 4 years ago

Yes, we probably do. I might have some time tomorrow night. After that it will be Sunday at the earliest. I'm hoping I can have some more to show you by then

On Wed, Oct 16, 2019 at 9:13 AM petermr notifications@github.com wrote:

Huge Thanks for all the work. I am installing MACOSX KNIME. Then we can work together. I'd be surprised if we couldn't make rapid progress. We probably need to talk.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=ACM3QMVCBZT7AA4S5KQTOZLQO3EIXA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBLSB6I#issuecomment-542580985, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACM3QMXOL6JE3MJ7WTAFEZDQO3EIXANCNFSM4JAOYUAQ .

-- Clyde

petermr commented 4 years ago

OK name a time... (Check this is 2019-10-16)

deadlyvices commented 4 years ago

Let's say 21:00 UTC Thursday (8 PM) - provisionally

petermr commented 4 years ago

On Wed, Oct 16, 2019 at 12:40 PM Clyde Davies notifications@github.com wrote:

Let's say 21:00 UTC Thursday (8 PM) - provisionally

Which parallel universe? 21:00 Wednesday, Coordinated Universal Time (UTC) is 22:00 Wednesday, in Cambridge, UK

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS6QOQELSKI53CELIA3QO34RPA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMFHBA#issuecomment-542659460, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS5SSI6T6OXGU4L22B3QO34RPANCNFSM4JAOYUAQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

deadlyvices commented 4 years ago

Oh sorry. having a 'blonde' moment. 19:00 UTC! (8 PM!)

deadlyvices commented 4 years ago

I have had a meeting cancelled tomorrow, so I can do any time between 12 and 2pm if you'd prefer that? I know I would.

petermr commented 4 years ago

That sounds good lets say 1300 tomorrow Thursday

On Wed, Oct 16, 2019 at 6:41 PM Clyde Davies notifications@github.com wrote:

I have had a meeting cancelled tomorrow, so I can do any time between 12 and 2pm if you'd prefer that? I know I would.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS4N7AA2PHCX3K46HF3QO5G37A5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBNKXVA#issuecomment-542813140, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZD6C7KKGKBHGIVHFLQO5G37ANCNFSM4JAOYUAQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

deadlyvices commented 4 years ago

Alright. Catch you later

petermr commented 4 years ago

Can we catch up again? (I know you are bravely occupied elsewhere today) I am struggling with merging the changes you and I made to my workflows so they run on MACOSX. I think we have to agree a branch strategy. But I also think we have to consider how to configure Unix/MAC effortlessly.

I have lots of conflicts...

I will probably delete the workflows as I don't want to overwrite yours. I think we need branches for this.

deadlyvices commented 4 years ago

We need to get relative paths working in KNIME as this should resolve our config issues.

On Sat, 19 Oct 2019, 11:04 petermr, notifications@github.com wrote:

Can we catch up again? (I know you are bravely occupied elsewhere today) I am struggling with merging the changes you and I made to my workflows so they run on MACOSX. I think we have to agree a branch strategy. But I also think we have to consider how to configure Unix/MAC effortlessly.

I have lots of conflicts...

I will probably delete the workflows as I don't want to overwrite yours. I think we need branches for this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=ACM3QMUHRYDF5DZL6RJTYQDQPLLTJA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBXKSTA#issuecomment-544123212, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACM3QMXAKPU65NKT3EIHJ7TQPLLTJANCNFSM4JAOYUAQ .

deadlyvices commented 4 years ago

Thinking about this some more - I haven't had much time to do anything over the past few days - we could probably do with investigating whether these issues with KNIME & relative paths is purely on Windows.
When I do actually have some time I'll investigate setting up an Ubuntu VM on Azure (I have a budget) to see whether these are just Windows being its customary pain in the nether regions.

petermr commented 4 years ago

great.

On Wed, Oct 23, 2019 at 11:04 AM Clyde Davies notifications@github.com wrote:

Thinking about this some more - I haven't had much time to do anything over the past few days - we could probably do with investigating whether these issues with KNIME & relative paths is purely on Windows. When I do actually have some time I'll investigate setting up an Ubuntu VM on Azure (I have a budget) to see whether these are just Windows being its customary pain in the nether regions.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS7ZCOTWGPN2YCVI7B3QQAORDA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECA2ZAQ#issuecomment-545369218, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZJKBN3GU2METUZUP3QQAORDANCNFSM4JAOYUAQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

deadlyvices commented 4 years ago

Still getting the same issue on an Ubuntu box. Watch this space.

deadlyvices commented 4 years ago

Got it.
There should be a new workflow knime_tagger that should work on any box. It does however output a massive CSV file listing the term co-occurrence so be careful when you run it. I have hopefully configured git so it ignores this file.

So what we need now is development of a custom tag set so we can tag up with every single kind of term we have. I'm out of my depth when it comes to this. Over to you @petermr

deadlyvices commented 4 years ago

After looking at the dictionaries, I think we need the following tags

What overarching tag type name do you want to give them? DAVE? AMI? CEVOPEN?

EmanuelFaria commented 4 years ago

Depends on how @petermr will referto it in the article, I think.

Sent with GitHawk

petermr commented 4 years ago

Call them CEV

Thanks

On Tue, 29 Oct 2019, 14:57 Clyde Davies, notifications@github.com wrote:

After looking at the dictionaries, I think we need the following tags

  • ACTIVITY
  • COMPOUND
  • INSTRUMENT
  • PLANT
  • PLANTPARTS
  • PROCESS
  • TARGETORGANISM

What overarching tag type name do you want to give them? DAVE? AMI? CEVOPEN?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCSYGWVPG6RUZE2XOJTTQRBFOHA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECQZ6KQ#issuecomment-547462954, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS4C5DN2SMG2AFVFXS3QRBFOHANCNFSM4JAOYUAQ .

deadlyvices commented 4 years ago

You'll have to do that. I've tried getting my head around the Eclipse way and producing a Java KNIME node but I'm well out of my depth, sorry.

Do we have any other Java resource we can call on? You have too much to do as it is.

deadlyvices commented 4 years ago

I have just run the workflow on the oil1000 search. It runs fine. It's just that the co-occurrence results file is 4.6 GB!

petermr commented 4 years ago

On Thu, Oct 31, 2019 at 10:51 AM Clyde Davies notifications@github.com wrote:

I have just run the workflow on the oil1000 search. It runs fine.

Excellent.

It's just that the co-occurrence results file is 4.6 GB!

May be necessary to limit the prefix and suff search to certain sections or limit the prefix and suffix strings

Am working on getting compound names out of tables.

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS7YHLG5IWBSPLM7PCDQRK2BHA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECXJQXY#issuecomment-548313183, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS4K3E6AR2R6VNKHUD3QRK2BHANCNFSM4JAOYUAQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

deadlyvices commented 4 years ago

I might be able to help with the compound names. KNIME has an Oscar tagger node, if that would do the job.

deadlyvices commented 4 years ago

I've just modified the workflow to scan the oil1000 fulltext.xml files, extract the tables and tag up the compound names using the OSCAR tagger. I can also extract them along with the original document ref. Any use?

deadlyvices commented 4 years ago

Is this sort of analysis any use? image

deadlyvices commented 4 years ago

Now converting the names to structures!

petermr commented 4 years ago

Yes, I think it's very useful on a per-document basis!

On Thu, Oct 31, 2019 at 1:39 PM Clyde Davies notifications@github.com wrote:

Is this sort of analysis any use? [image: image] https://user-images.githubusercontent.com/10074162/67951640-d6b75d80-fbe3-11e9-8d28-407c4a17613b.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS4T62U3KYSY4EPSPGLQRLNYVA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECXZXKQ#issuecomment-548379562, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS4G2BNCKOD4EXFEI5LQRLNYVANCNFSM4JAOYUAQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK