Open petermr opened 4 years ago
I've made some headway in processing ContentMine output using KNIME. I can now read in the full text of articles and tag it up using OSCAR.
We should also be able to tag using the dictionaries that Ambarish are creating.
I'm picking up the conversation from our initial email chart. I'm currently using KNIME to see if we can use the output of getpapers and ami as feedstock for some further analysis.
There are two main areas I'm investigating:
I should hopefully have something to show over the next few days or so.
So: I suppose the next question is - if we're looking for telling correlations between conditions and substances, should we be looking in the abstract or the body? I'd say the former as it's most likely to spell out key conclusions.
So: I've now got KNIME reading in the dictionaries and tagging up documents with them. This is good, but would be even better if there was an easy way of defining one's own tag set. I have do make to with the standard set of tags. The only way of doing this is in Java, and I know absolutely nothing about Java programming. So if anyone wants to take this on, please, be my guest.
On Mon, Oct 14, 2019 at 4:06 PM Clyde Davies notifications@github.com wrote:
So: I've now got KNIME reading in the dictionaries and tagging up documents with them. This is good, but would be even better if there was an easy way of defining one's own tag set. I have to make to with the standard set of tags.
What are these tags?
The only way of doing this is in Java, https://www.knime.com/for-developers-integration-of-custom-tag-sets and I know absolutely nothing about Java programming.
I can understand what is in the tutorial. It also uses Eclipse which is a standard Java IDE and I'm familiar with it.
So if anyone wants to take this on, please, be my guest.
If you can spell out what is required we can estimate the effort. (Most things at this stage are tweaking examples, not writing code from scratch).
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS62SHPZQRZJFT4FNDLQOSDGBA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBFDMTI#issuecomment-541734477, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZRRWNUYHNKK75S6CTQOSDGBANCNFSM4JAOYUAQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
KNIME passes data tables from node to node. These support a Document column type. Tagging nodes insert, well, tags into the document to mark recognised terms. There are your typical part-of-speech POS tag categories but also some more specialised ones. The blue text [OSCAR(ONT)] shown in the screenshot means that the OSCAR category has recognised an ontology entity and tagged it up appropriately. I think if we are going to take this further then we probably need an AMI category with PLANT, ACTIVITY, INSTRUMENT, PLANTPART etc. tags for each dictionary and the entity classes they recognise. Currently I'm having to use OSCAR tag types with a tag value of CUST to mark these up. This is nowhere near granular enough for our purposes. And it's wrong.
This is what I've got so far. You can see the three taggers at the end of the workflow:
I leave the dictionary tagging until last
You can see what happens: it's recognised all the terms in the dictionaries but has been unable to differentiate between them:
I think the most immediate advantage of this approach is that it allows us to visually test the dictionaries. The word 'and' is tagged, but why this should is unknown.
I just discovered a delightful feature of KNIME hub which makes it incredibly easy to overwrite a workflow with an old version! So I am going to have to recreate that one. But I have the screenshot so that at least will save me time figuring it out all over again.
What dictionaries are running and how does each work? I am a bit mystified by the words which are tagged. I now understand that each dictionary provides a single class of tag . I think your approach is reasonable - but need to know how it decides to tag.
On Mon, Oct 14, 2019 at 8:01 PM Clyde Davies notifications@github.com wrote:
I think the most immediate advantage of this approach is that it allows us to visually test the dictionaries. The word 'and' is tagged, but why this should is unknown.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS7OCTEIOKKQ7QFJIGDQOS6YJA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBGCB6Q#issuecomment-541860090, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS4P4NOCH775LOCVR4TQOS6YJANCNFSM4JAOYUAQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
I'm just lumping all the dictionaries together right now. I think I can get away with just loading each dictionary and assigning a custom tag value individually to the matched terms. That would disambiguate effectively as there aren't so many of them.
On Mon, 14 Oct 2019, 22:20 petermr, notifications@github.com wrote:
What dictionaries are running and how does each work? I am a bit mystified by the words which are tagged. I now understand that each dictionary provides a single class of tag . I think your approach is reasonable - but need to know how it decides to tag.
On Mon, Oct 14, 2019 at 8:01 PM Clyde Davies notifications@github.com wrote:
I think the most immediate advantage of this approach is that it allows us to visually test the dictionaries. The word 'and' is tagged, but why this should is unknown.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS7OCTEIOKKQ7QFJIGDQOS6YJA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBGCB6Q#issuecomment-541860090 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAFTCS4P4NOCH775LOCVR4TQOS6YJANCNFSM4JAOYUAQ
.
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=ACM3QMSZCZVQCSQ2LKJERGLQOTPBXA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBGSY3I#issuecomment-541928557, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACM3QMWJY6RQLENBITIXVRTQOTPBXANCNFSM4JAOYUAQ .
Thanks Can you post a typical output?
On Mon, 14 Oct 2019, 22:58 Clyde Davies, notifications@github.com wrote:
I'm just lumping all the dictionaries together right now. I think I can get away with just loading each dictionary and assigning a custom tag value individually to the matched terms. That would disambiguate effectively as there aren't so many of them.
On Mon, 14 Oct 2019, 22:20 petermr, notifications@github.com wrote:
What dictionaries are running and how does each work? I am a bit mystified by the words which are tagged. I now understand that each dictionary provides a single class of tag . I think your approach is reasonable - but need to know how it decides to tag.
On Mon, Oct 14, 2019 at 8:01 PM Clyde Davies notifications@github.com wrote:
I think the most immediate advantage of this approach is that it allows us to visually test the dictionaries. The word 'and' is tagged, but why this should is unknown.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
, or unsubscribe <
https://github.com/notifications/unsubscribe-auth/AAFTCS4P4NOCH775LOCVR4TQOS6YJANCNFSM4JAOYUAQ
.
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=ACM3QMSZCZVQCSQ2LKJERGLQOTPBXA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBGSY3I#issuecomment-541928557 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACM3QMWJY6RQLENBITIXVRTQOTPBXANCNFSM4JAOYUAQ
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS73FOLQLKIXKP62CMLQOTTQRA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBGXD2Q#issuecomment-541946346, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS6RKX2SGHAHC3ABUEDQOTTQRANCNFSM4JAOYUAQ .
I'll have to rework that workflow anyway, so when I do I will get it to generate the output. Might be a little while doing that, though.
On Tue, Oct 15, 2019 at 9:13 AM petermr notifications@github.com wrote:
Thanks Can you post a typical output?
On Mon, 14 Oct 2019, 22:58 Clyde Davies, notifications@github.com wrote:
I'm just lumping all the dictionaries together right now. I think I can get away with just loading each dictionary and assigning a custom tag value individually to the matched terms. That would disambiguate effectively as there aren't so many of them.
On Mon, 14 Oct 2019, 22:20 petermr, notifications@github.com wrote:
What dictionaries are running and how does each work? I am a bit mystified by the words which are tagged. I now understand that each dictionary provides a single class of tag . I think your approach is reasonable - but need to know how it decides to tag.
On Mon, Oct 14, 2019 at 8:01 PM Clyde Davies <notifications@github.com
wrote:
I think the most immediate advantage of this approach is that it allows us to visually test the dictionaries. The word 'and' is tagged, but why this should is unknown.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
, or unsubscribe <
https://github.com/notifications/unsubscribe-auth/AAFTCS4P4NOCH775LOCVR4TQOS6YJANCNFSM4JAOYUAQ
.
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <
, or unsubscribe <
https://github.com/notifications/unsubscribe-auth/ACM3QMWJY6RQLENBITIXVRTQOTPBXANCNFSM4JAOYUAQ
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS73FOLQLKIXKP62CMLQOTTQRA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBGXD2Q#issuecomment-541946346 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAFTCS6RKX2SGHAHC3ABUEDQOTTQRANCNFSM4JAOYUAQ
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=ACM3QMTS2QN3ZSMYB6HAUUDQOV3RPA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBH3IKY#issuecomment-542094379, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACM3QMSGBCNXZ2C4YXFEW43QOV3RPANCNFSM4JAOYUAQ .
-- Clyde
OK, I've been thinking about the best way to share this. And it's the most obvious way: get the workflows into GitHub. I suggest we create a Knime folder (not where where exactly) in the repo and put the Knime workflows as immediate children. A workflow is simply a folder hierarchy, so it should fit in nicely.
This will also allow us to use relative paths when referencing our existing data files. Which should mean no pesky config changes for new users.
Great, as always happy to talk.
On Tue, Oct 15, 2019 at 11:53 AM Clyde Davies notifications@github.com wrote:
OK, I've been thinking about the best way to share this. And it's the most obvious way: get the workflows into GitHub. I suggest we create a Knime folder (not where where exactly) in the repo and put the Knime workflows as immediate children. A workflow is simply a folder hierarchy, so it should fit in nicely. This will also allow us to use relative paths when referencing our existing data files. Which should mean no pesky config changes for new users.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCSZ64RBLSPWZ2VPOFXLQOWOKNA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBIKEKY#issuecomment-542155307, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS27RH4CFA43GEHV7IDQOWOKNANCNFSM4JAOYUAQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
How about if I create a top-level folder workflows and then one immediately under it knime? Then I put my workflows into that? That way if we adopt any other tools, we can put them into their own tool specific folders.
Quick question: I have a good working knowledge of git but am no expert. Do .gitignore files only work at the top level of the repo, or can they be declared lower down so they're folder-specific?
do whatever makes sense. Github is free - we can create a fresh repo if it doesn't work...
On Tue, Oct 15, 2019 at 2:40 PM Clyde Davies notifications@github.com wrote:
How about if I create a top-level folder *workflows* and then one immediately under it knime*? Then I put my workflows into that? That way if we adopt any other tools, we can put them into their own tool specific folders.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS735DNIFFF5VAQW5BTQOXB6JA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBIZJII#issuecomment-542217377, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSYHLZWDNYC5K37HIVTQOXB6JANCNFSM4JAOYUAQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Do you update the master branch directly or is it all through pull requests? I've just created a branch that I'm happy with and could do with merging into master.
On Tue, Oct 15, 2019 at 3:47 PM Clyde Davies notifications@github.com wrote:
Do you update the master branch directly or is it all through pull requests? I've just created a branch that I'm happy with and could do with merging into master.
At present we generally all push to master directly. That's because maintaining consistent policy on branches is not easy when people aren't familiar with GIthub. I know it's crude but so far no problems
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS5E3GTCPSMUXL5EFULQOXJZ5A5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJBL2A#issuecomment-542250472, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZ3K72MRPSEJ4XUTQ3QOXJZ5ANCNFSM4JAOYUAQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
OK. We're used on Chem4Word to processing changes through pull requests. I'll still create task branches, just to keep things isolated, but I'll merge in directly
Sure, It's more critical for code, especially where it overlaps. Here you are creating you own contribution and - so far - there probably won't be potential fo conflicts. It might happen if several people want to author a dictionary.
On Tue, Oct 15, 2019 at 3:58 PM Clyde Davies notifications@github.com wrote:
OK. We're used on Chem4Word to processing changes through pull requests. I'll still create task branches, just to keep things isolated, but I'll merge in directly
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCSYIITUOXMBKIFOQWEDQOXLCDA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJCV7A#issuecomment-542255868, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS62AKIOLEPAH7Y7B23QOXLCDANCNFSM4JAOYUAQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
I'll work as I suggested then, until we end up with more people working on the workflows.
Huge Thanks for all the work. I am installing MACOSX KNIME. Then we can work together. I'd be surprised if we couldn't make rapid progress. We probably need to talk.
UPDATE have installed it. Point me at a CEV workflow!
Yes, we probably do. I might have some time tomorrow night. After that it will be Sunday at the earliest. I'm hoping I can have some more to show you by then
On Wed, Oct 16, 2019 at 9:13 AM petermr notifications@github.com wrote:
Huge Thanks for all the work. I am installing MACOSX KNIME. Then we can work together. I'd be surprised if we couldn't make rapid progress. We probably need to talk.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=ACM3QMVCBZT7AA4S5KQTOZLQO3EIXA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBLSB6I#issuecomment-542580985, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACM3QMXOL6JE3MJ7WTAFEZDQO3EIXANCNFSM4JAOYUAQ .
-- Clyde
OK name a time... (Check this is 2019-10-16)
Let's say 21:00 UTC Thursday (8 PM) - provisionally
On Wed, Oct 16, 2019 at 12:40 PM Clyde Davies notifications@github.com wrote:
Let's say 21:00 UTC Thursday (8 PM) - provisionally
Which parallel universe? 21:00 Wednesday, Coordinated Universal Time (UTC) is 22:00 Wednesday, in Cambridge, UK
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS6QOQELSKI53CELIA3QO34RPA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMFHBA#issuecomment-542659460, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS5SSI6T6OXGU4L22B3QO34RPANCNFSM4JAOYUAQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Oh sorry. having a 'blonde' moment. 19:00 UTC! (8 PM!)
I have had a meeting cancelled tomorrow, so I can do any time between 12 and 2pm if you'd prefer that? I know I would.
That sounds good lets say 1300 tomorrow Thursday
On Wed, Oct 16, 2019 at 6:41 PM Clyde Davies notifications@github.com wrote:
I have had a meeting cancelled tomorrow, so I can do any time between 12 and 2pm if you'd prefer that? I know I would.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS4N7AA2PHCX3K46HF3QO5G37A5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBNKXVA#issuecomment-542813140, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZD6C7KKGKBHGIVHFLQO5G37ANCNFSM4JAOYUAQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Alright. Catch you later
Can we catch up again? (I know you are bravely occupied elsewhere today) I am struggling with merging the changes you and I made to my workflows so they run on MACOSX. I think we have to agree a branch strategy. But I also think we have to consider how to configure Unix/MAC effortlessly.
I have lots of conflicts...
I will probably delete the workflows as I don't want to overwrite yours. I think we need branches for this.
We need to get relative paths working in KNIME as this should resolve our config issues.
On Sat, 19 Oct 2019, 11:04 petermr, notifications@github.com wrote:
Can we catch up again? (I know you are bravely occupied elsewhere today) I am struggling with merging the changes you and I made to my workflows so they run on MACOSX. I think we have to agree a branch strategy. But I also think we have to consider how to configure Unix/MAC effortlessly.
I have lots of conflicts...
I will probably delete the workflows as I don't want to overwrite yours. I think we need branches for this.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=ACM3QMUHRYDF5DZL6RJTYQDQPLLTJA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBXKSTA#issuecomment-544123212, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACM3QMXAKPU65NKT3EIHJ7TQPLLTJANCNFSM4JAOYUAQ .
Thinking about this some more - I haven't had much time to do anything over the past few days - we could probably do with investigating whether these issues with KNIME & relative paths is purely on Windows.
When I do actually have some time I'll investigate setting up an Ubuntu VM on Azure (I have a budget) to see whether these are just Windows being its customary pain in the nether regions.
great.
On Wed, Oct 23, 2019 at 11:04 AM Clyde Davies notifications@github.com wrote:
Thinking about this some more - I haven't had much time to do anything over the past few days - we could probably do with investigating whether these issues with KNIME & relative paths is purely on Windows. When I do actually have some time I'll investigate setting up an Ubuntu VM on Azure (I have a budget) to see whether these are just Windows being its customary pain in the nether regions.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS7ZCOTWGPN2YCVI7B3QQAORDA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECA2ZAQ#issuecomment-545369218, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZJKBN3GU2METUZUP3QQAORDANCNFSM4JAOYUAQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Still getting the same issue on an Ubuntu box. Watch this space.
Got it.
There should be a new workflow knime_tagger that should work on any box. It does however output a massive CSV file listing the term co-occurrence so be careful when you run it. I have hopefully configured git so it ignores this file.
So what we need now is development of a custom tag set so we can tag up with every single kind of term we have. I'm out of my depth when it comes to this. Over to you @petermr
After looking at the dictionaries, I think we need the following tags
What overarching tag type name do you want to give them? DAVE? AMI? CEVOPEN?
Depends on how @petermr will referto it in the article, I think.
Sent with GitHawk
Call them CEV
Thanks
On Tue, 29 Oct 2019, 14:57 Clyde Davies, notifications@github.com wrote:
After looking at the dictionaries, I think we need the following tags
- ACTIVITY
- COMPOUND
- INSTRUMENT
- PLANT
- PLANTPARTS
- PROCESS
- TARGETORGANISM
What overarching tag type name do you want to give them? DAVE? AMI? CEVOPEN?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCSYGWVPG6RUZE2XOJTTQRBFOHA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECQZ6KQ#issuecomment-547462954, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS4C5DN2SMG2AFVFXS3QRBFOHANCNFSM4JAOYUAQ .
You'll have to do that. I've tried getting my head around the Eclipse way and producing a Java KNIME node but I'm well out of my depth, sorry.
Do we have any other Java resource we can call on? You have too much to do as it is.
I have just run the workflow on the oil1000 search. It runs fine. It's just that the co-occurrence results file is 4.6 GB!
On Thu, Oct 31, 2019 at 10:51 AM Clyde Davies notifications@github.com wrote:
I have just run the workflow on the oil1000 search. It runs fine.
Excellent.
It's just that the co-occurrence results file is 4.6 GB!
May be necessary to limit the prefix and suff search to certain sections or limit the prefix and suffix strings
Am working on getting compound names out of tables.
—
You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS7YHLG5IWBSPLM7PCDQRK2BHA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECXJQXY#issuecomment-548313183, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS4K3E6AR2R6VNKHUD3QRK2BHANCNFSM4JAOYUAQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
I might be able to help with the compound names. KNIME has an Oscar tagger node, if that would do the job.
I've just modified the workflow to scan the oil1000 fulltext.xml files, extract the tables and tag up the compound names using the OSCAR tagger. I can also extract them along with the original document ref. Any use?
Is this sort of analysis any use?
Now converting the names to structures!
Yes, I think it's very useful on a per-document basis!
On Thu, Oct 31, 2019 at 1:39 PM Clyde Davies notifications@github.com wrote:
Is this sort of analysis any use? [image: image] https://user-images.githubusercontent.com/10074162/67951640-d6b75d80-fbe3-11e9-8d28-407c4a17613b.png
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/38?email_source=notifications&email_token=AAFTCS4T62U3KYSY4EPSPGLQRLNYVA5CNFSM4JAOYUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECXZXKQ#issuecomment-548379562, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS4G2BNCKOD4EXFEI5LQRLNYVANCNFSM4JAOYUAQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
@deadlyvices has been exploring this and reporting in email.
ACTION: copy any relevant past emails here...