rrtucci / mappa_mundi

Causal DAG Extraction from Text (DEFT)
MIT License
58 stars 4 forks source link

My dream is if we could start a project at github/huggingface, called Mappa #2

Open jbdatascience opened 1 year ago

jbdatascience commented 1 year ago

In about a week I will retire from work, so from then on I could look more closely into these issues. But I think we are already on the right track!

A question: should we do all our communication via this Github channel? Of could we choose another one? Do you have a proposal? Please let me know.

My dream is if we could start a project at github/huggingface, called Mappa Mundi V2 , run by us two and others, with the goal of doing ALL 1100 movie scripts from the IMSDb website.

We would have to replace the sentence simplification and sentence similarity portions of the old Mappa Mundi by these speedier, more precise versions.

On Wed, May 3, 2023 at 4:01 AM Jan Bours @.***> wrote:

Looks promising indeed. If you can get your DagAtlas algorithm more speedy, then you can produce a lot more DAGs from texts and that will make it more interesting fro practical use cases, would not it?

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/1#issuecomment-1532604545, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYLUAPA7UWNMX2RGETTXEIGEVANCNFSM6AAAAAAXMWE3C4 . You are receiving this because you commented.Message ID: @.***>

_Originally posted by @rrtucci in https://github.com/rrtucci/mappa_mundi/issues/1#issuecomment-1532907671_

rrtucci commented 1 year ago

I think communicating via this github channel has worked well so far, because it has shown with great transparency, to any person interested in this project, what obstacles might arise. So for now, I am in favor of continuing our communications this way. Except perhaps we might open more than one issue.

I also wanted to suggest that if you can convince one or more Dutch companies to sponsor us, we might be able to make a small non-profit company out of this. I suspect that there are a lot of European companies that want to compete with USA/OpenAI/Google in the LLM arena.

On Sat, May 6, 2023 at 3:24 AM Jan Bours @.***> wrote:

In about a week I will retire from work, so from then on I could look more closely into these issues. But I think we are already on the right track!

A question: should we do all our communication via this Github channel? Of could we choose another one? Do you have a proposal? Please let me know.

My dream is if we could start a project at github/huggingface, called Mappa Mundi V2 , run by us two and others, with the goal of doing ALL 1100 movie scripts from the IMSDb website.

We would have to replace the sentence simplification and sentence similarity portions of the old Mappa Mundi by these speedier, more precise versions.

On Wed, May 3, 2023 at 4:01 AM Jan Bours @.***> wrote:

Looks promising indeed. If you can get your DagAtlas algorithm more speedy, then you can produce a lot more DAGs from texts and that will make it more interesting fro practical use cases, would not it?

— Reply to this email directly, view it on GitHub

1 (comment)

https://github.com/rrtucci/mappa_mundi/issues/1#issuecomment-1532604545, or unsubscribe

https://github.com/notifications/unsubscribe-auth/ADZWTYLUAPA7UWNMX2RGETTXEIGEVANCNFSM6AAAAAAXMWE3C4 . You are receiving this because you commented.Message ID: @.***>

Originally posted by @rrtucci https://github.com/rrtucci in #1 (comment) https://github.com/rrtucci/mappa_mundi/issues/1#issuecomment-1532907671

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYJF7T5JL5CJQNL6FODXEX4AXANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

rrtucci commented 1 year ago

Hi Jan, I've been learning about sentence simplification and decided to do it with this software: https://github.com/amanbasu/sentence-simplification The BERT model is not bleeding edge, but it will be very easy to implement and is free under a permissive license. If BERT proves insufficient, then we can try a more sophisticated approach later on. Even if we trade BERT for a more modern model in the future, we will learn a lot of practical Transformer usage tips from the BERT experience.

On Sat, May 6, 2023 at 3:24 AM Jan Bours @.***> wrote:

In about a week I will retire from work, so from then on I could look more closely into these issues. But I think we are already on the right track!

A question: should we do all our communication via this Github channel? Of could we choose another one? Do you have a proposal? Please let me know.

My dream is if we could start a project at github/huggingface, called Mappa Mundi V2 , run by us two and others, with the goal of doing ALL 1100 movie scripts from the IMSDb website.

We would have to replace the sentence simplification and sentence similarity portions of the old Mappa Mundi by these speedier, more precise versions.

On Wed, May 3, 2023 at 4:01 AM Jan Bours @.***> wrote:

Looks promising indeed. If you can get your DagAtlas algorithm more speedy, then you can produce a lot more DAGs from texts and that will make it more interesting fro practical use cases, would not it?

— Reply to this email directly, view it on GitHub

1 (comment)

https://github.com/rrtucci/mappa_mundi/issues/1#issuecomment-1532604545, or unsubscribe

https://github.com/notifications/unsubscribe-auth/ADZWTYLUAPA7UWNMX2RGETTXEIGEVANCNFSM6AAAAAAXMWE3C4 . You are receiving this because you commented.Message ID: @.***>

Originally posted by @rrtucci https://github.com/rrtucci in #1 (comment) https://github.com/rrtucci/mappa_mundi/issues/1#issuecomment-1532907671

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYJF7T5JL5CJQNL6FODXEX4AXANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

jbdatascience commented 1 year ago

Interesting sentence simplification software !

I am thinking about our project. I intend to do the following and points worth considering :

Please share your thoughts on this!

jbdatascience commented 1 year ago

This quote was put on LinkedIn by me (I thought you would like it!):

https://www.linkedin.com/posts/janbours_science-causality-knowledge-activity-7061268335266541568-qF8a

The power of science is its discovery of causal law.

Bertrand Russell. Human Knowledge: Its Scope and Limits. Simon and Schuster, 1948.

https://twitter.com/scitechtalk/status/1655502709267505154?s=46&t=8nmTR-4kmZaFFBCGObS6fA

https://twitter.com/artistexyz/status/1655576091149451269?s=46&t=8nmTR-4kmZaFFBCGObS6fA

Excellent quote. Thanks for sharing.

rrtucci commented 1 year ago

After looking more carefully at the last software I proposed we use for sentence simplification, I realized that it doesn't quite do what we need. It simplifies a complicated sentence, but it doesn't split it into simpler sentences, which is what we need. So I am still looking around for a viable solution for sentence simplification.

"Do we have a good definition of the ultimate goal of this project?" The immediate goal is to get the 1100 movies processed, and to reduce the sources of noise (like bad sentence simplification and bad similarity evaluation) as much as possible. Will we get a strong signal then, or is this algo junk? That is what we will find out.

" I saw you have updated that document, but what are the changes?" Small changes to improve clarity. None significant. I am always trying to improve the clarity of what I write. That's me. If we ever finish a new, better Mappa Mundi, that will warrant a new paper explaining the changes.

"Is the example of DEFT on tic-tac-toe games still useful to consider?" Not really. The old paper has some good ideas, but the Mappa Mundi algorithm is different and better. The tic-tac-toe example was simple to consider at the time, 7 months ago, but it now feels very artificial to me, compared to the movie scripts example.

The other questions that you ask will best be answered as we go along.

On Mon, May 8, 2023 at 8:13 AM Jan Bours @.***> wrote:

Interesting sentence simplification software !

I am thinking about our project. I intend to do the following and points worth considering :

  • Do we have a good definition of the ultimate goal of this project?
  • Reread your white paper “Causal DAG Extraction from 3 Short Stories and 3 Movie Scripts” to be able to completely understand the principles involved. I saw you have updated that document, but what are the changes?
  • Is the example of DEFT on tic-tac-toe games still useful to consider? Is it still relevant for the purpose of our project?
  • Proposal: to be able to check if the complete MM algorithm functions as expected, it would be good to have a ground truth (text(s) for which we know the Causal DAG(s) that are the groundtruth) against which we can check the DAGs discovered by the MM algorithm. I am not sure how to do this yet.
  • The output from the MM algorithm is the set of discovered DAGs (the DagAtlas). How could we interpret them? And what could be a practical use case for this DagAtlas? Perhaps we could describe that in more detail. How could existing LLMs be updated by the DagAtlas so that they could represent causal knowledge (in the most general form)?
  • Are there comparable, already existing algorithms that implement the functionality that MM has? How do they compare?

Please share your thoughts on this!

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2#issuecomment-1538260013, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYPNMHAJ6GTLLXOQ3IDXFDPPVANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

jbdatascience commented 1 year ago

OK, I will also be on the look out for a viable solution for sentence simplification. Could you specify what we could consider as a good solution (and what not)?

Also I am wondering why the existing solution for simplification is not good enough?

jbdatascience commented 1 year ago

Also I am wondering what would be the optimal way of subdividing the text for the purpose of creating the nodes in the DAG. Now we are using sentence clauses, but I can imagine other possible subdivisions of the text. There are many instances of NLP applications where chunking is used for this.

rrtucci commented 1 year ago

When time comes to evaluate the similarity of 2 nodes, if the nodes represent very long compound sentences, the nodes will not be very well separated in similarity space.

Chunking and breaking into clauses is too crude and often leads to unintelligible sets of words for each node. What is needed is to break every compound sentence into a set of simple, easily understandable, declarative sentences without loss of meaning, and call each of those simple sentences a node.

ChatGPT does this 1 to N sentence conversion really well. I give an example in the white paper. If we had money, we could just ask ChatGPT to do it. I think chatGPT could do it for $10-$100 per movie. What I would love to find is a FREE way of doing what ChatGPT does so well. I've scoured GitHub trying to find a decent solution. Some people have tried to do this in various ways, but their methods have huge failure rates, like 30%.

On Tue, May 9, 2023 at 9:13 AM Jan Bours @.***> wrote:

Also I am wondering what would be the optimal way of subdividing the text for the purpose of creating the nodes in the DAG. Now we are using sentence clauses, but I can imagine other possible subdivisions of the text. There are many instances of NLP applications where chunking is used for this.

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2#issuecomment-1540102020, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYLWQKWON4RD3RL2B6DXFI7HJANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

rrtucci commented 1 year ago

This sounds very promising. https://arxiv.org/pdf/2304.07774v1.pdf

On Tue, May 9, 2023 at 9:13 AM Jan Bours @.***> wrote:

Also I am wondering what would be the optimal way of subdividing the text for the purpose of creating the nodes in the DAG. Now we are using sentence clauses, but I can imagine other possible subdivisions of the text. There are many instances of NLP applications where chunking is used for this.

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2#issuecomment-1540102020, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYLWQKWON4RD3RL2B6DXFI7HJANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

rrtucci commented 1 year ago

This looks promising https://github.com/serenayj/ABCD-ACL2021

On Tue, May 9, 2023 at 9:13 AM Jan Bours @.***> wrote:

Also I am wondering what would be the optimal way of subdividing the text for the purpose of creating the nodes in the DAG. Now we are using sentence clauses, but I can imagine other possible subdivisions of the text. There are many instances of NLP applications where chunking is used for this.

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2#issuecomment-1540102020, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYLWQKWON4RD3RL2B6DXFI7HJANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

jbdatascience commented 1 year ago

OK, you gave a good description of what we expect from a good simplification algorithm.

I find it interesting that ChatGPT is able to produce good simplifications! Could there already be an open source version of ChatGPT that we could use for this purpose? Also I wonder if HuggingFace doesn’t have any FREE algorithms for doing this? They have a large database of algorithms which I didn’t explore fully up till now. I will explore those 2 alternatives seeking for a solution.

What still is not clear to me what is wrong with the simplification method you are using at this moment? I what way is it not good enough?

I also have a question about the existing MM algorithm. When I look at the output of your notebook: https://github.com/rrtucci/mappa_mundi/blob/master/jupyter_notebooks/navigating_short_stories.ipynb

and I look at the DAGs it produces at the bottom of that notebook, I am not sure how I can interpret them correctly. Could you take 1 example and describe that example fully so it becomes clear how to interpret it?

rrtucci commented 1 year ago

"What still is not clear to me what is wrong with the simplification method you are using at this moment? I what way is it not good enough?"

It's too crude. Ideally, you want each node to be assigned a simple, intelligible sentence clearly describing an event. For example, "Bob ate a green apple". All the sentences in a movie script must be reduced to simple sentences like that. The similarity of two sentences only works well if those two sentences are simple.

"I look at the DAGs it produces at the bottom of that notebook, I am not sure how I can interpret them correctly"

They don't make sense because the current method of simplification of sentences is not very good. That introduces a lot of noise into the calculation of the similarity of 2 nodes.

On Wed, May 10, 2023 at 2:40 AM Jan Bours @.***> wrote:

OK, you gave a good description of what we expect from a good simplification algorithm.

I find it interesting that ChatGPT is able to produce good simplifications! Could there already be an open source version of ChatGPT that we could use for this purpose? Also I wonder if HuggingFace doesn’t have any FREE algorithms for doing this? They have a large database of algorithms which I didn’t explore fully up till now. I will explore those 2 alternatives seeking for a solution.

What still is not clear to me what is wrong with the simplification method you are using at this moment? I what way is it not good enough?

I also have a question about the existing MM algorithm. When I look at the output of your notebook:

https://github.com/rrtucci/mappa_mundi/blob/master/jupyter_notebooks/navigating_short_stories.ipynb

and I look at the DAGs it produces at the bottom of that notebook, I am not sure how I can interpret them correctly. Could you take 1 example and describe that example fully so it becomes clear how to interpret it?

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2#issuecomment-1541434400, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYNWWLJB7BXLEUDMKWTXFMZ6NANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

jbdatascience commented 1 year ago

OK, so we must have a new, better method of sentence simplification! That is clear for me now.

jbdatascience commented 1 year ago

I did an interesting experiment with GPT4 instead of ChatGPT to do sentence simplification on the same example you give in your white paper:

PROMPT: Consider the following quote: “A range of mountains takes form in the haze. Moving closer. The mountains are piles of TRASH. The entire surface is nothing but waste. “..We’re gonna find adventure in the evening air...” A silhouetted city in the distance. What looks like skyscrapers turns into trash. Thousands of neatly stacked CUBES OF TRASH, stories high. Rows and rows of stacked cubes, like city avenues. They go on for miles.”

Give an enumerated list summarizing this quote. Each list item must be a simple declarative sentence consisting of a subject phrase and a verb phrase. The list items must be in chronological order of occurrence in the quote.

REPLY:

This bot may make incorrect statements. It does not have knowledge of events after 2021. Tap to learn more.

  1. Mountains appear in the haze.
  2. Distance closes.
  3. Mountains transform into trash piles.
  4. Evening air promises adventure.
  5. Silhouetted city emerges.
  6. Skyscrapers reveal themselves as trash.
  7. Cubes of trash form structures.
  8. Trash cube rows extend for miles.

I think the quality of these sentence simplifications is even a lot better than that produced by ChatGPT ! What do you think?

jbdatascience commented 1 year ago

I have found this tweet of Huggingface (it is a tweet from today, May 10th 2023 !):

https://twitter.com/gpt_index/status/1656091102221570048?s=20

"Today’s release highlight: Native @huggingface LLM support. Want to run any open-source LLM from 🤗? (Camel, StableLM, alpaca, Dolly, etc.)

Now you can in 🦙! (huge s/o to Logan)"

Then I looked further and found this at Huggingface:

https://huggingface.co/LLMs

https://github.com/deep-diver/LLM-As-Chatbot :
About
LLM as a Chatbot Service

    With this complete notebook of using a LLM as a chatbot:
https://github.com/deep-diver/LLM-As-Chatbot/blob/main/notebooks/llm_as_chatbot_in_colab.ipynb
    ----->
https://raw.githubusercontent.com/deep-diver/LLM-As-Chatbot/main/notebooks/llm_as_chatbot_in_colab.ipynb

I am trying to get this to work and then prompting it with your example of sentence simplifying from your white paper. Up till now no luck, but that is a technical issue. I will try to solve that (and search Huggingface for other FREE implementations).

But what is more important: I have tried your prompt in a number of different online ChatGPT-like LLMs, and even the more simple ones give excellent replies to your sentence simplifying prompt! See and try for yourself:

https://poe.com/Sage https://poe.com/Claude%2B (3 FREE messages per day) https://poe.com/GPT-4 (1 FREE message per day <----- !!!!!!!) https://poe.com/Claude-instant https://poe.com/ChatGPT (FREE !!!!!) https://poe.com/Dragonfly

jbdatascience commented 1 year ago

Also I am looking into this Open Source Open ChatGPT variant:

StableVicuna: The New King of Open ChatGPTs? https://www.youtube.com/watch?v=m_xD0algP4k

Accompanied by this COLAB Notebook with uses a GPU T4 Runtime (!): YT Stable Vicuna13B 8bit in Colab.ipynb https://colab.research.google.com/drive/1Kvf3qF1TXE-jR-N5G9z1XxVf5z-ljFt2?usp=sharing

It can hande a prompt like this: raw_output = pipe(get_prompt('Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?')) parse_text(raw_output)

They started with 23 apples. After using 20 for lunch, they were left with 3 apples. Then, they bought 6 more apples, so now they have a total of 9 apples. Therefore, the cafeteria has 9 apples in total.

I think it will be a small step to get it to answer your sentence simplification prompt from your white paper, which I shall try with this notebook. (Up till now it crashes due to insufficient memory errors ...)

rrtucci commented 1 year ago

Yes! I agree. GPT-4 does sentence simplification better than ChatGPT! Excellent experiment. Thanks.

On Wed, May 10, 2023 at 4:48 AM Jan Bours @.***> wrote:

I did an interesting experiment with GPT4 instead of ChatGPT to do sentence simplification on the same example you give in your white paper:

PROMPT: Consider the following quote: “A range of mountains takes form in the haze. Moving closer. The mountains are piles of TRASH. The entire surface is nothing but waste. “..We’re gonna find adventure in the evening air...” A silhouetted city in the distance. What looks like skyscrapers turns into trash. Thousands of neatly stacked CUBES OF TRASH, stories high. Rows and rows of stacked cubes, like city avenues. They go on for miles.”

Give an enumerated list summarizing this quote. Each list item must be a simple declarative sentence consisting of a subject phrase and a verb phrase. The list items must be in chronological order of occurrence in the quote.

REPLY:

This bot may make incorrect statements. It does not have knowledge of events after 2021. Tap to learn more.

  1. Mountains appear in the haze.
  2. Distance closes.
  3. Mountains transform into trash piles.
  4. Evening air promises adventure.
  5. Silhouetted city emerges.
  6. Skyscrapers reveal themselves as trash.
  7. Cubes of trash form structures.
  8. Trash cube rows extend for miles.

I think the quality of these sentence simplifications is even a lot better than that produced by ChatGPT ! What do you think?

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2#issuecomment-1541599898, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYPFTMECJXGCK2EREY3XFNI35ANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

rrtucci commented 1 year ago

Nice! Thanks. Sentence Vectors seem to be part of the foundation of most LLM software, and of some "Knowledge Graph" software, and now of some Causal Inference software. A pleasant surprise.

On Wed, May 10, 2023 at 6:31 AM Jan Bours @.***> wrote:

I have found this tweet of Huggingface (it is a tweet from today, May 10th 2023 !):

https://twitter.com/gpt_index/status/1656091102221570048?s=20 http://url

"Today’s release highlight: Native @huggingface https://github.com/huggingface LLM support. Want to run any open-source LLM from 🤗? (Camel, StableLM, alpaca, Dolly, etc.)

  • Customize model/tokenizer/chat prompts 📝
  • Full streaming support 🌊

Now you can in 🦙! (huge s/o to Logan)"

Then I looked further and found this at Huggingface:

https://huggingface.co/LLMs

https://github.com/deep-diver/LLM-As-Chatbot : About LLM as a Chatbot Service

With this complete notebook of using a LLM as a chatbot:https://github.com/deep-diver/LLM-As-Chatbot/blob/main/notebooks/llm_as_chatbot_in_colab.ipynb
----->https://raw.githubusercontent.com/deep-diver/LLM-As-Chatbot/main/notebooks/llm_as_chatbot_in_colab.ipynb

I am trying to get this to work and then prompting it with your example of sentence simplifying from your white paper. Up till now no luck, but that is a technical issue. I will try to solve that (and search Huggingface for other FREE implementations).

But what is more important: I have tried your prompt in a number of different online ChatGPT-like LLMs, and even the more simple ones give excellent replies to your sentence simplifying prompt! See and try for yourself:

https://poe.com/Sage https://poe.com/Claude%2B (3 FREE messages per day) https://poe.com/GPT-4 (1 FREE message per day <----- !!!!!!!) https://poe.com/Claude-instant https://poe.com/ChatGPT (FREE !!!!!) https://poe.com/Dragonfly

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2#issuecomment-1541892392, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYNW7ARTFIUEK7TRZXLXFNU7PANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

jbdatascience commented 1 year ago

Also I am looking into this Open Source Open ChatGPT variant:

StableVicuna: The New King of Open ChatGPTs? https://www.youtube.com/watch?v=m_xD0algP4k

Accompanied by this COLAB Notebook with uses a GPU T4 Runtime (!): YT Stable Vicuna13B 8bit in Colab.ipynb https://colab.research.google.com/drive/1Kvf3qF1TXE-jR-N5G9z1XxVf5z-ljFt2?usp=sharing

It can hande a prompt like this: raw_output = pipe(get_prompt('Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?')) parse_text(raw_output)

They started with 23 apples. After using 20 for lunch, they were left with 3 apples. Then, they bought 6 more apples, so now they have a total of 9 apples. Therefore, the cafeteria has 9 apples in total.

I think it will be a small step to get it to answer your sentence simplification prompt from your white paper, which I shall try with this notebook. (Up till now it crashes due to insufficient memory errors ...)

ANSWER TO MY QUESTION ABOUT THIS:

I was trying to run the COLAB notebook but it crashes due to insufficient memory errors. Is it ONLY possible to run this on the PRO paid version of COLAB? How could I run this on a free version of COLAB or perhaps even locally on a PC in a Jupyter notebook? Perhaps using 4-bit version of the LLM as you mention?

@samwitteveenai 34 minuten geleden yeah unfortunately you need a GPU with a lot of VRAM to run this so Colab free isn't going to work. 1

jbdatascience commented 1 year ago

At this moment I am looking into this list of Open Source implementations of LLMs:

GitHub - eugeneyan/open-llms: 🤖 A list of open LLMs available for commercial use.

https://github.com/eugeneyan/open-llms

Trying h2oGPT from that list online with your sentence simplification prompt gives this result (which is not bad I think):

http://pseudotensor.hopto.org:37860/

prompt:

“ Consider the following quote: “A range of mountains takes form in the haze. Moving closer. The mountains are piles of TRASH. The entire surface is nothing but waste. “…We’re gonna find adventure in the evening air…” A silhouetted city in the distance. What looks like skyscrapers turns into trash. Thousands of neatly stacked CUBES OF TRASH, stories high. Rows and rows of stacked cubes, like city avenues. They go on for miles.”

Give an enumerated list summarizing this quote. Each list item must be a simple declarative sentence consisting of a subject phrase and a verb phrase. The list items must be in chronological order of occurrence in the quote. “

OUTPUT: “

  1. A range of mountains takes form in the haze.
  2. Moving closer.
  3. The mountains are piles of TRASH.
  4. The entire surface is nothing but waste.
  5. “…We’re gonna find adventure in the evening air…”
  6. A silhouetted city in the distance.
  7. What looks like skyscrapers turns into trash.
  8. Thousands of neatly stacked CUBES OF TRASH, stories high.
  9. Rows and rows of stacked cubes, like city avenues. They go on for miles. “

I think this would be acceptable.

Trying this h2oGPT in a notebook on Google Colab (also on Kaggle) gives memory error.

Maybe we could try: hivemind/gpt-j-6B-8bit · Hugging Face

https://huggingface.co/hivemind/gpt-j-6B-8bit?text=My+name+is+Mariama%2C+my+favorite

It uses some kind of compression (8 bit arithmetic I think) so it fits within COLAB free version memory. I will try it out in this COLAB notebook

finetune-gpt-j-6B-8bit.ipynb - Colaboratory

https://colab.research.google.com/drive/1ft6wQU0BhqG5PRlwgaZJv2VukKKjU4Es#scrollTo=DuW4H6HTS82r

rrtucci commented 1 year ago

Sounds good. It occurs to me that this WALL-E example is not very good because it doesn't have compound sentences, or complex sentences, or compound-simplex sentences. It would be nice if the LLM would split those into simple sentences.

To show you what I mean, I asked ChatGPT this: What is a simple, complex, compound and complex compound sentence, with examples?

Here is its answer:

Sure, I'd be happy to provide some examples of simple, complex, compound, and complex-compound sentences.

  1. Simple sentence: A simple sentence consists of a subject and a verb and expresses a complete thought.

Example: The cat sat on the mat.

  1. Complex sentence: A complex sentence consists of an independent clause and a dependent clause.

Example: Although I studied hard, I failed the exam.

  1. Compound sentence: A compound sentence consists of two or more independent clauses joined by a conjunction.

Example: I wanted to go to the park, but it started raining.

  1. Complex-compound sentence: A complex-compound sentence consists of two or more independent clauses and at least one dependent clause.

Example: After I finish my homework, I'll go for a walk, but only if it's not too late.

Note that these are just a few examples of each type of sentence, and there can be many variations and combinations of each type.

On Thu, May 11, 2023 at 3:12 AM Jan Bours @.***> wrote:

At this moment I am looking into this list of Open Source implementations of LLMs:

GitHub - eugeneyan/open-llms: 🤖 A list of open LLMs available for commercial use.

https://github.com/eugeneyan/open-llms

Trying h2oGPT from that list online with your sentence simplification prompt gives this result (which is not bad I think):

http://pseudotensor.hopto.org:37860/

prompt:

“ Consider the following quote: “A range of mountains takes form in the haze. Moving closer. The mountains are piles of TRASH. The entire surface is nothing but waste. “…We’re gonna find adventure in the evening air…” A silhouetted city in the distance. What looks like skyscrapers turns into trash. Thousands of neatly stacked CUBES OF TRASH, stories high. Rows and rows of stacked cubes, like city avenues. They go on for miles.”

Give an enumerated list summarizing this quote. Each list item must be a simple declarative sentence consisting of a subject phrase and a verb phrase. The list items must be in chronological order of occurrence in the quote. “

OUTPUT: “

  1. A range of mountains takes form in the haze.
  2. Moving closer.
  3. The mountains are piles of TRASH.
  4. The entire surface is nothing but waste.
  5. “…We’re gonna find adventure in the evening air…”
  6. A silhouetted city in the distance.
  7. What looks like skyscrapers turns into trash.
  8. Thousands of neatly stacked CUBES OF TRASH, stories high.
  9. Rows and rows of stacked cubes, like city avenues. They go on for miles. “

I think this would be acceptable.

Trying this h2oGPT in a notebook on Google Colab (also on Kaggle) gives memory error.

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2#issuecomment-1543455619, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYIZ4XVO5E6WE3SY57TXFSGLZANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

rrtucci commented 1 year ago

This looks VERY promising. https://github.com/explosion/spacy-llm

On Thu, May 11, 2023 at 3:12 AM Jan Bours @.***> wrote:

At this moment I am looking into this list of Open Source implementations of LLMs:

GitHub - eugeneyan/open-llms: 🤖 A list of open LLMs available for commercial use.

https://github.com/eugeneyan/open-llms

Trying h2oGPT from that list online with your sentence simplification prompt gives this result (which is not bad I think):

http://pseudotensor.hopto.org:37860/

prompt:

“ Consider the following quote: “A range of mountains takes form in the haze. Moving closer. The mountains are piles of TRASH. The entire surface is nothing but waste. “…We’re gonna find adventure in the evening air…” A silhouetted city in the distance. What looks like skyscrapers turns into trash. Thousands of neatly stacked CUBES OF TRASH, stories high. Rows and rows of stacked cubes, like city avenues. They go on for miles.”

Give an enumerated list summarizing this quote. Each list item must be a simple declarative sentence consisting of a subject phrase and a verb phrase. The list items must be in chronological order of occurrence in the quote. “

OUTPUT: “

  1. A range of mountains takes form in the haze.
  2. Moving closer.
  3. The mountains are piles of TRASH.
  4. The entire surface is nothing but waste.
  5. “…We’re gonna find adventure in the evening air…”
  6. A silhouetted city in the distance.
  7. What looks like skyscrapers turns into trash.
  8. Thousands of neatly stacked CUBES OF TRASH, stories high.
  9. Rows and rows of stacked cubes, like city avenues. They go on for miles. “

I think this would be acceptable.

Trying this h2oGPT in a notebook on Google Colab (also on Kaggle) gives memory error.

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2#issuecomment-1543455619, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYIZ4XVO5E6WE3SY57TXFSGLZANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

jbdatascience commented 1 year ago

Sounds good. It occurs to me that this WALL-E example is not very good because it doesn't have compound sentences, or complex sentences, or compound-simplex sentences. It would be nice if the LLM would split those into simple sentences. To show you what I mean, I asked ChatGPT this: What is a simple, complex, compound and complex compound sentence, with examples? Here is its answer: Sure, I'd be happy to provide some examples of simple, complex, compound, and complex-compound sentences. 1. Simple sentence: A simple sentence consists of a subject and a verb and expresses a complete thought. Example: The cat sat on the mat. 1. Complex sentence: A complex sentence consists of an independent clause and a dependent clause. Example: Although I studied hard, I failed the exam. 1. Compound sentence: A compound sentence consists of two or more independent clauses joined by a conjunction. Example: I wanted to go to the park, but it started raining. 1. Complex-compound sentence: A complex-compound sentence consists of two or more independent clauses and at least one dependent clause. Example: After I finish my homework, I'll go for a walk, but only if it's not too late. Note that these are just a few examples of each type of sentence, and there can be many variations and combinations of each type.

Then we have to search for more complex texts. Do you have any examples in mind?

jbdatascience commented 1 year ago

This looks VERY promising. https://github.com/explosion/spacy-llm

That looks promising indeed! Have you already tried?

In the meantime I am looking into this interesting list in my search for a good open source LLM that ideally can be run locally on your or mine PC to perform sentence simplification or other NLP tasks (our main goal is to be able to generate a DagAtlas based e.g. on a large number of texts, so if we can accomplish that by letting run those LLMs locally for enough time, we could reach that goal!):

List of Open Sourced Fine-Tuned Large Language Models (LLM) | by Sung Kim | Geek Culture | Mar, 2023 | Medium

An incomplete list of open-sourced fine-tuned Large Language Models (LLM) you can run locally on your computer

https://medium.com/geekculture/list-of-open-sourced-fine-tuned-large-language-models-llm-8d95a2e0dc76

rrtucci commented 1 year ago

You can choose a paragraph from one of the P.G. Wodehouse stories. If you want really long sentences, I believe Moby Dick by Herman Melville has sentences that are several pages long :) Legal documents have very long sentences too.

On Sat, May 13, 2023 at 1:32 AM Jan Bours @.***> wrote:

Sounds good. It occurs to me that this WALL-E example is not very good because it doesn't have compound sentences, or complex sentences, or compound-simplex sentences. It would be nice if the LLM would split those into simple sentences. To show you what I mean, I asked ChatGPT this: What is a simple, complex, compound and complex compound sentence, with examples? Here is its answer: Sure, I'd be happy to provide some examples of simple, complex, compound, and complex-compound sentences. 1. Simple sentence: A simple sentence consists of a subject and a verb and expresses a complete thought. Example: The cat sat on the mat. 1. Complex sentence: A complex sentence consists of an independent clause and a dependent clause. Example: Although I studied hard, I failed the exam. 1. Compound sentence: A compound sentence consists of two or more independent clauses joined by a conjunction. Example: I wanted to go to the park, but it started raining.

  1. Complex-compound sentence: A complex-compound sentence consists of two or more independent clauses and at least one dependent clause. Example: After I finish my homework, I'll go for a walk, but only if it's not too late. Note that these are just a few examples of each type of sentence, and there can be many variations and combinations of each type. … <#m-8535341612585800405>

Then we have to search for more complex texts. Do you have any examples in mind?

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2#issuecomment-1546534079, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYK32XGHWHWJQZK3UGDXF4MGNANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

rrtucci commented 1 year ago

No. I put in a comment on Twitter to the developers of this software, but got no reply. I've been trying very hard to get some sponsors for this project, but no luck so far. https://twitter.com/artistexyz/status/1657109267323801600

On Sat, May 13, 2023 at 2:09 AM Jan Bours @.***> wrote:

This looks VERY promising. https://github.com/explosion/spacy-llm … <#m6188584845194245083>

That looks promising indeed! Have you already tried?

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2#issuecomment-1546539680, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYMMAWWAOTVMHG6FZXTXF4QPLANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

jbdatascience commented 1 year ago

I am searching for a way to have texts for which we know the Ground Truth causal relations, so we could test the MM algorithm if it could find these causal relations.

Now I am considering this idea💡 : would it be possible to have LLMs generating texts with custom, predefined causal relations in it? That would give us a way to test if the MM algorithm could correctly come up with those causal relations in a DAG.

I would like to hear your comments. Do you consider this as worth pursuing?

rrtucci commented 1 year ago

These are all interesting research questions to pursue in the future, but I think getting the 1100 movie scripts analyzed should be our immediate highest priority, because it's real world data not simulated data.

Maybe you could start with a 500 node DAG that has a simple sentence attached to each node, and ask a LLM to produce 999 new DAGs that differ from the original DAG by rephrasings of the sentences. Then take your 1000 DAGs and "flatten" them out by transforming them into 1000 short stories. Then input those 1000 short stories into Mappa Mundi, and see if it can retrieve the original 500 node DAG.

On Sat, May 13, 2023 at 6:14 AM Jan Bours @.***> wrote:

I am searching for a way to have texts for which we know the Ground Truth causal relations, so we could test the MM algorithm if it could find these causal relations.

Now I am considering this idea💡 : would it be possible to have LLMs generating texts with custom, predefined causal relations in it? That would give us a way to test if the MM algorithm could correctly come up with those causal relations in a DAG.

I would like to hear your comments. Do you consider this as worth pursuing?

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2#issuecomment-1546614224, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYJMY4I64OEIDNVKCR3XF5NF7ANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

rrtucci commented 1 year ago

Another thing that occurs to me is to analyze 10,000 games of chess with Mappa Mundi, and then ask a chess expert to judge whether the resulting DAGs make sense to him/her. The ground truth in this case is the chess expert :)

On Sat, May 13, 2023 at 6:14 AM Jan Bours @.***> wrote:

I am searching for a way to have texts for which we know the Ground Truth causal relations, so we could test the MM algorithm if it could find these causal relations.

Now I am considering this idea💡 : would it be possible to have LLMs generating texts with custom, predefined causal relations in it? That would give us a way to test if the MM algorithm could correctly come up with those causal relations in a DAG.

I would like to hear your comments. Do you consider this as worth pursuing?

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2#issuecomment-1546614224, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYJMY4I64OEIDNVKCR3XF5NF7ANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

rrtucci commented 1 year ago

Note that for the chess example, the sentences are Moves of chess pieces. There is no need for sentence simplification, and sentence similarity is just equality of moves. Hence the chess example requires no NLP whatsoever. It is trivial compared to the movie scripts example.

On Sat, May 13, 2023 at 6:14 AM Jan Bours @.***> wrote:

I am searching for a way to have texts for which we know the Ground Truth causal relations, so we could test the MM algorithm if it could find these causal relations.

Now I am considering this idea💡 : would it be possible to have LLMs generating texts with custom, predefined causal relations in it? That would give us a way to test if the MM algorithm could correctly come up with those causal relations in a DAG.

I would like to hear your comments. Do you consider this as worth pursuing?

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2#issuecomment-1546614224, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYJMY4I64OEIDNVKCR3XF5NF7ANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

rrtucci commented 1 year ago

This article has some simple code that might do the job of sentence simplification for us. https://medium.com/inspiredbrilliance/knowledge-graphs-from-complex-text-eb009aeed48e I am trying to contact the author to get some references from her.

On Sat, May 13, 2023 at 6:14 AM Jan Bours @.***> wrote:

I am searching for a way to have texts for which we know the Ground Truth causal relations, so we could test the MM algorithm if it could find these causal relations.

Now I am considering this idea💡 : would it be possible to have LLMs generating texts with custom, predefined causal relations in it? That would give us a way to test if the MM algorithm could correctly come up with those causal relations in a DAG.

I would like to hear your comments. Do you consider this as worth pursuing?

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2#issuecomment-1546614224, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYJMY4I64OEIDNVKCR3XF5NF7ANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

jbdatascience commented 1 year ago

This article has some simple code that might do the job of sentence simplification for us. https://medium.com/inspiredbrilliance/knowledge-graphs-from-complex-text-eb009aeed48e I am trying to contact the author to get some references from her.

On Sat, May 13, 2023 at 6:14 AM Jan Bours @.> wrote: I am searching for a way to have texts for which we know the Ground Truth causal relations, so we could test the MM algorithm if it could find these causal relations. Now I am considering this idea💡 : would it be possible to have LLMs generating texts with custom, predefined causal relations in it? That would give us a way to test if the MM algorithm could correctly come up with those causal relations in a DAG. I would like to hear your comments. Do you consider this as worth pursuing? — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYJMY4I64OEIDNVKCR3XF5NF7ANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.>

I will have to read it to give comments to it, but I will do that!

In the mean time I am going through a lot of alternatives for us to use.

My impression at this time is that are a lot of intiatives concerning practical use cases of using Natural Language Processing using LLM's and a lot of Open Source implementations. |t is complicated to discern which ones are of practical to use for our project. I think you come to the same conclusion ...

But I am testing a lot of them, with the focus of sentence simplification as you have indicated as being crucial to the main goal: indentifying causal relationships purely from texts.

Practical: I mean being able to produce a lot of DAGs purely from texts. And because we already saw, that modern day LLM's are able to implement sentence simplication, and using this NLP task e.g. locally on our PC's using Open Source LLMs, this would enable us to do sentence simplification on a large set of texts, would help enormasly in accomplish our main goal.

What is your vision on this?

rrtucci commented 1 year ago

the problem with using LLMs to do the sentence simplification is that I have zero funds and my computer only has 8 Mb of RAM and no GPU. So I need to find a free way of doing the sentence simplification on my computer. I am pretty sure now this can be done by using software that "extracts Knowlege Graph triples (subject, verb, object) from text data". Here is a new reference I found today that does this: https://github.com/rishabkr/Knowledege-Graphs-and-NLP/blob/main/Knowledge%20Graphs/Knowledge%20Graphs%20with%20NLP%20.ipynb

rrtucci commented 1 year ago

I think I finally found a mature library that can do 1 to N sentence simplification quickly. The idea is to convert a complicated sentence into many "RDF triples" and use those as simplified sentences https://en.wikipedia.org/wiki/Semantic_triple

In Java https://stanfordnlp.github.io/CoreNLP/openie.html

Python wrapper https://pypi.org/project/stanford-openie/

rrtucci commented 1 year ago

Sigh. So I got openie running on my computer, and asked it to extract RDF triples from this sentence, which I know is difficult:

The man, who had never liked the words "booby" and "boobyhatch," and who liked them even less on a shining morning when there was a unicorn in the garden, thought for a moment.

openie found only one triple: (unicorn, in the, garden)

Then I asked chatgpt the following:

split the following sentence into RDF triples: The man, who had never liked the words "booby" and "boobyhatch," and who liked them even less on a shining morning when there was a unicorn in the garden, thought for a moment.

ChatGPT trounced openie's answer. It responded excellently as follows:

Subject: The man Predicate: thought for a moment Object: None

Subject: The man Predicate: had never liked Object: the words "booby" and "boobyhatch"

Subject: The man Predicate: liked even less Object: the words "booby" and "boobyhatch" on a shining morning when there was a unicorn in the garden.

Subject: There Predicate: was Object: a unicorn in the garden on a shining morning.

rrtucci commented 1 year ago

So, I've totally given up on OpenIE. After many experiments, I have concluded that it gives terrible results.

I've been using the following prompt:

Split the following text into RDF triples. State the result as "triples = " followed by a Python list of tuples: Robert, who lives nearby, was walking his dog. While eating food Ram is singing

Here is ChatGPT's answer

triples = [ ('Robert', 'was walking', 'his dog'), ('Robert', 'lives nearby', None), ('Ram', 'is singing', 'while eating food') ]

And here is Bard's answer

triples = [ ("Robert", "lives nearby"), ("Robert", "was walking his dog"), ("Ram", "is singing"), ("Ram", "is eating food") ]

jbdatascience commented 1 year ago

Seems that ChatGPT's answers are very good. I believe ChatGPT's is based on GPT3.5 I read somewhere (I don't know if that is still the case though). So my conclusion is that a FREE Open Source variant of ChatGPT would be the most promising variant for us to ar least do a Proof Of Concept (POC) on a large number of texts (I think you prefer using the Movie Scripts for this, is this correct?).

At this very moment I am doing some experiments with such a FREE Open Source variant of ChatGPT (GPT4 even !). I hope that these experiments turn out well; I will also use your examples with triples to see if these also turn out well. I will try to report about the outcomes of those experiments in a couple of days, so please bear with me ......

rrtucci commented 1 year ago

Sounds excellent. Thanks. In the meantime, I will prepare a blog post explaining this problem and asking the public for suggestions.

On Mon, May 15, 2023 at 7:24 AM Jan Bours @.***> wrote:

Seems that ChatGPT's answers are very good. I believe ChatGPT's is based on GPT3.5 I read somewhere (I don't know if that is still the case though). So my conclusion is that a FREE Open Source variant of ChatGPT would be the most promising variant for us to ar least do a Proof Of Concept (POC) on a large number of texts (I think you prefer using the Movie Scripts for this, is this correct?).

At this very moment I am doing some experiments with such a FREE Open Source variant of ChatGPT (GPT4 even !). I hope that these experiments turn out well; I will also use your examples with triples to see if these also turn out well. I will try to report about the outcomes of those experiments in a couple of days, so please bear with me ......

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2#issuecomment-1547674933, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYLDRKAZACNX3B634B3XGIG5FANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

jbdatascience commented 1 year ago

Sounds excellent. Thanks. In the meantime, I will prepare a blog post explaining this problem and asking the public for suggestions. On Mon, May 15, 2023 at 7:24 AM Jan Bours @.> wrote: Seems that ChatGPT's answers are very good. I believe ChatGPT's is based on GPT3.5 I read somewhere (I don't know if that is still the case though). So my conclusion is that a FREE Open Source variant of ChatGPT would be the most promising variant for us to ar least do a Proof Of Concept (POC) on a large number of texts (I think you prefer using the Movie Scripts for this, is this correct?). At this very moment I am doing some experiments with such a FREE Open Source variant of ChatGPT (GPT4 even !). I hope that these experiments turn out well; I will also use your examples with triples to see if these also turn out well. I will try to report about the outcomes of those experiments in a couple of days, so please bear with me ...... — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYLDRKAZACNX3B634B3XGIG5FANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.>

That is a good idea, because it is difficult to find good open source software for our purposes, given the fact there are a lot of initiatives. Choosing among them is not easy!

jbdatascience commented 1 year ago

Up till now I have no full proof, convincing examples of RDF extraction from texts with FREE Open Source LLMs. But my search goes on.

What is really convincing, are the results of RDF extraction from texts with LLMs such as ChatGPT and GPT4 ! But there are LLMs that are able to perform RDF extraction from texts, here I show my results on a number of them:

POE SAGE:

https://poe.com/Sage

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

RESPONSE:

I assume you want me to extract the subject-predicate-object triples from the given text and represent them as RDF triples. Here's the list of triples:

triples = [
    ('Robert', 'lives nearby', ' '),
    ('Robert', 'walking', 'dog'),
    ('Ram', 'eating', 'food'),
    ('Ram', 'singing', '')
]

Note that in the first triple, the object is an empty string because the sentence doesn't provide any information about what Robert's location is nearby.


POE CLAUDE+ :

https://poe.com/Claude%2B

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

RESPONSE :

triples = [ ("Robert", "lives nearby", None), ("Robert", "was walking", "his dog"), ("Ram", "is", "singing"), ("Ram", "eating", "food")]


POE CLAUDE+ :

https://poe.com/Claude-instant

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

RESPONSE :

triples = [ ('Robert', 'lives near','walking his dog'), ('Robert','walking his dog', ''), ('While eating food', 'Ram','is singing') ]


POE DRAGONFLY :

https://poe.com/Dragonfly

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

RESPONSE :

triples = [("Robert", "lives nearby", True), ("Robert", "was walking", "dog"), ("Ram", "was eating", "food"), ("Ram", "was singing", True)]

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

POE DRAGONFLY : https://poe.com/s/8UsuBs1jNgbBYQXXGjKa

Can you explain how the AI chatbot generates RDF triples?

Sure! An AI chatbotcan generate RDF triplesby analyzing natural language and extracting subject-predicate-object relationships from the text. For example, in the sentence "Robert, who lives nearby, was walking his dog," the subject is "Robert," the predicate is "lives nearby" and the object is "dog." The AI chatbot then stores this information as an RDF triplein the form of ("Robert", "lives nearby", "dog").

Can the AI chatbot generate RDF triples from any type of text?

Yes, the AI chatbot can generate RDF triplesfrom any type of text, as long as it is in natural language. The AI chatbotcan analyze the text and extract the subject-predicate-object relationships from the text, and store them as RDF triples.

How does the AI chatbot handle text with complex sentence structures?

The AI chatbot is able to handle complex sentence structuresby analyzing the text and extracting the subject-predicate-object relationships from the text. For example, if the text contains a sentence with multiple clauses, the AI chatbotwill analyze each clause and extract the subject-predicate-object relationships from each clause. The AI chatbot will then store this information as an RDF triplein the form of (subject, predicate, object).

How does the AI chatbot determine the subject-predicate-object relationships in a sentence?

The AI chatbot uses natural language processing techniques to analyze the text and identify subject-predicate-object relationships. For example, the AI chatbotwill analyze the syntax of the sentence and look for nouns, verbs, and other grammatical structuresto determine the subject, predicate, and object of the sentence.

How does the AI chatbot handle sentences with multiple subjects or objects?

The AI chatbot is able to handle sentences with multiple subjects or objects by extracting the subject-predicate-object relationships from each subject or object. For example, if the sentence contains multiple subjects or objects, the AI chatbotwill analyze each subject or object and extract the subject-predicate-object relationships from each one. The AI chatbot will then store this information as separate RDF triplesin the form of (subject, predicate, object).

Does the AI chatbot use any specific algorithms to extract subject-predicate-object relationships?

Yes, the AI chatbot uses natural language processing techniques, such as part-of-speech tagging, dependency parsing, and semantic analysis, to identify subject-predicate-object relationships in a sentence. These techniques help the AI chatbotto accurately identify the subject, predicate, and object of the sentence.

How does the AI chatbot use semantic analysis to identify subject-predicate-object relationships?

The AI chatbot uses semantic analysisto identify subject-predicate-object relationships by analyzing the meaning of the words in a sentence. This helps the AI chatbotto accurately identify the subject, predicate, and object of the sentence, even if the sentence contains multiple subjects or objects.

What are some challenges the AI chatbot faces when identifying subject-predicate-object relationships?

The AI chatbot faces several challenges when identifying subject-predicate-object relationships, such as ambiguity in the text, lack of context, and complex sentence structures. The AI chatbotmust be able to accurately identify the subject, predicate, and object of the sentence in order to generate the correct RDF triples. Additionally, the AI chatbot must be able to handle ambiguity and lack of context in order to accurately identify subject-predicate-object relationships.

Can you give an example of a complex sentence structure that the AI chatbot might struggle with?

Yes, the AI chatbot might struggle with a sentence such as "John, who was walking his dog, saw Mary, who was eating an apple." In this sentence, there are two subjects and two objects, and the AI chatbotmust be able to accurately identify the subject-predicate-object relationships in order to generate the correct RDF triples.

The AI chatbot must be able to accurately identify the subject, predicate, and object of the sentence in order to generate the correct RDF triples. This can be a challenge, especially with complex sentencessuch as this one. The AI chatbotmust be able to analyze the syntax of the sentence and look for nouns, verbs, and other grammatical structuresin order to accurately identify the subject, predicate, and object of the sentence. Additionally, the AI chatbot must be able to handle ambiguity and lack of context in order to accurately identify subject-predicate-object relationships.

What are some techniques that the AI chatbot might use to accurately identify the subject, predicate, and object of a sentence?

The AI chatbot might use several techniques to accurately identify the subject, predicate, and object of a sentence, such as natural language processing techniques, such as part-of-speech tagging, dependency parsing, and semantic analysis. Additionally, the AI chatbotmight use machine learning algorithmsto identify subject-predicate-object relationships in a sentence. These techniques help the AI chatbot to accurately identify the subject, predicate, and object of the sentence, even if the sentence contains multiple subjects or objects.

What are some examples of machine learning algorithms that can identify subject-predicate-object relationships?

Some examples of machine learning algorithmsthat can identify subject-predicate-object relationships include supervised machine learning algorithms, such as support vector machinesand decision trees, and unsupervised machine learningalgorithms, such as clustering and latent semanticanalysis. These algorithms can help the AI chatbotto accurately identify the subject, predicate, and object of the sentence, even if the sentence contains multiple subjects or objects.

Can you explain how clustering algorithms identify subject-predicate-object relationships? Clustering algorithms identify subject-predicate-object relationships by analyzing the text and grouping related words together. For example, if the text contains the words "John," "walk," and "dog," the clustering algorithmwill group these words together and identify the subject-predicate-object relationship of "John was walking his dog." The AI chatbotwill then store this information as an RDF triplein the form of ("John", "was walking", "dog").


POE NeevaAI :

https://poe.com/NeevaAI

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

RESPONSE :

I couldn't find an exact answer, but here are some search results that might help:

——-

stackoverflow.com - How to convert Text file/document into RDF triples?

——-

arxiv.org - Triple-to-Text: Converting RDF Triples into High-Quality Natural ...

ABSTRACT Knowledge base is one of the main forms to represent informa- tion in a structured way. A knowledge base typically consists of Resource Description Frameworks (RDF) triples which describe the entities and their relations. Generating natural language description of the knowledge base is an important task in NLP, which has been formulated as a conditional language generation task and tackled using the sequence-to-sequence framework. Current works mostly train the language models by maximum likelihood estimation, which tends to generate lousy sentences. In this paper, we argue that such a problem of maximum likelihood estimation is intrinsic, which is generally irrevocable via changing network structures. Accordingly, we propose a novel Triple-to-Text (T2T) framework, which approximately optimizes the inverse Kullback- Leibler (KL) divergence between the distributions of the real and generated sentences. Due to the nature that inverse KL imposes large penalty on fake-looking samples, the proposed method can significantly reduce the probability of generating low-quality sentences. Our experiments on three real-world datasets demonstrate that T2T can generate higher-quality sentences and outperform baseline models in several evaluation metrics.

CCS CONCEPTS • Computing methodologies → Natural language generation;

KEYWORDS Natural Language Generation, Sequence to Sequence Generation, Knowledge Bases

Triple-to-Text: Converting RDF Triples into High-Quality Natural Languages via Optimizing an Inverse KL Divergence https://arxiv.org/pdf/1906.01965.pdf

——-

yassine-hamoudi.github.io - Extracting RDF triples using the Stanford Parser

https://yassine-hamoudi.github.io/files/other/RDFTriples.pdf

——-

Explore more at neeva.com

Projet Pensées Profondes https://projetpp.github.io/

ProjetPP/PPP-QuestionParsing-ML-Standalone https://github.com/ProjetPP/PPP-QuestionParsing-ML-Standalone


POE https://poe.com/PromptGenius :

PromptGenius

Operated by

@LuipY The bot can enhance simple prompts, and I recommend using it by saying “My prompt is "…".” You can use this before you use GPT-4, so you can get the most out of it.

This bot uses a model that is powered by OpenAI. Learn more Give me the your prompt so I can improve it.

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

RESPONSE :

Possible answers to the prompt:

  1. [("Robert", "lives nearby", ""), ("Robert", "was walking", "his dog"), ("Ram", "is singing", "while eating food")]
  2. Convert the given text into RDF triples:

    triples = [("Robert", "lives nearby", ""), ("Robert", "was walking", "his dog"), ("Ram", "is singing", "while eating food")]

  3. Questions:

    a) What is the context or purpose of this task?

    b) Do you have any specific requirements or constraints for the RDF triples format?

    c) Is there any additional information you would like to include in the triples?


POE CHATGPT :

https://poe.com/ChatGPT

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

RESPONSE :

https://poe.com/s/G2jaPIyA48oHlrXjy4cr

I apologize, but the given text still cannot be directly expressed as RDF triplesas it lacks any clear semantic structure. However, I can provide an example of how the text could be modified to create RDF triples: Original text: "Robert, who lives nearby, was walking his dog. While eating food Ram is singing" Modified text: "Robert walks his dog near his home. Ram sings while eating food." Here are the RDF triples that can be extracted from the modified text:

json Copy

triples = [
    ("Robert", "walks", "his dog"),
    ("his dog", "is walked near", "his home"),
    ("Ram", "sings", "while eating food"),
]

Note that these triples are just one possible interpretation of the modified text, and other interpretations may yield different RDF triples. Additionally, the semantics of these triples may not accurately reflect the intended meaning of the original text.

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples. "Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

I apologize, but the given text still cannot be directly expressed as RDF triplesas it lacks any clear semantic structure. However, I can provide an example of how the text could be modified to create RDF triples: Modified text: "Robert lives nearby and walks his dog. Ram sings while eating food." Here are the RDF triples that can be extracted from the modified text: json Copy

triples = [
    ("Robert", "lives nearby", ""),
    ("Robert", "walks", "his dog"),
    ("Ram", "sings", ""),
    ("Ram", "eats", "food"),
]

Note that these triples are just one possible interpretation of the modified text, and other interpretations may yield different RDF triples. Additionally, the semantics of these triples may not accurately reflect the intended meaning of the original text.

Context cleared

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples. "Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

Sorry, but as a language model, I don't have the capability to execute Python code directly. However, here are the RDF triplesfor the given text: json Copy

triples = [
  ("Robert", "livesNearby", "nearby"),
  ("Robert", "isWalking", "hisDog"),
  ("Ram", "isEating", "food"),
  ("Ram", "isSinging", None)
]

Note that the RDF triples represent the relationships between the entities in the text. In this case, we have two entities, "Robert" and "Ram", and several relationships between them, such as "livesNearby", "isWalking", "isEating", and "isSinging". The third element in each tuple represents an object or value associated with the relationship. If there is no associated object or value (as in the case of Ram singing), it is represented as None.

Tell me more about RDF triples.

Sure! RDF triples are a way of representing data in the Resource Description Framework(RDF), which is a standard for describing resources on the web. RDF triplesconsist of three parts: a subject, a predicate, and an object.

The subject represents the resource being described, usually identified by a URI (Uniform Resource Identifier). The predicate represents a property or attribute of the resource, also identified by a URI. Finally, the object represents the value of the property or attribute.

For example, let's say we want to represent the fact that "John is a person". We can do this using an RDF triple:

Copy

<http://example.com/john> <http://example.com/isA> <http://example.com/person>

Here, the subject is the URI for John, the predicate is the URI for "is a", and the object is the URI for "person". Note that the URIs don't have to be real web addresses - they just need to be unique identifiers for the resources being described.

RDF triples can be used to represent many types of data, including metadata about web resources, scientific data, and more. They are a flexible and powerful way of representing structured data on the web.

RDF triples can be used to represent a wide variety of information, including metadata about web resources, scientific data, social network relationships, and more. They are often used in combination with other technologies such as SPARQL(a query languagefor RDF data) and OWL(a language for defining ontologies). In RDF, the subject and object of a triple can be any resource, including another triple. This allows for the creation of complex and nested data structures, which can be used to represent complex relationships between resources.

RDF triples can be serialized in several different formats, including RDF/XML, Turtle, and JSON-LD. These formats allow the RDF data to be easily exchanged and processed by different software systems.

One important use case for RDF is the creation of linked data, which involves creating a web of interconnected resources using RDF triples and URIs. Linked data allows for easier discovery and integration of information from different sources, making it easier to build applications that can combine data from multiple sources.

Overall, RDF triples provide a powerful and flexible way of representing structured data on the web, and are an important component of the Semantic Web.

How does SPARQL work with RDF data?

SPARQL (SPARQL Protocol and RDF QueryLanguage) is a query languagefor RDF data. It provides a way to query and retrieve data from RDF graphsusing a syntax similar to SQL.

SPARQL queries are made up of several parts, including a SELECT clausethat specifies the variables to be returned, a WHERE clause that specifies the patterns to be matched in the RDF graph, and an optional ORDER BY clausethat specifies the order in which the results should be returned.

The WHERE clause is the most important part of a SPARQL query, as it specifies the patterns to be matched in the RDF graph. These patterns are expressed using RDF triples, and can include variables that can match any resource in the graph.

For example, let's say we have the following RDF datathat describes some books and their authors:

stata Copy

@prefix ex: <http://example.com/> .
ex:book1 ex:title "The Great Gatsby" ;
        ex:author ex:fitzgerald .
ex:book2 ex:title "To Kill a Mockingbird" ;
        ex:author ex:lee .

We can use SPARQL to query this data and retrieve the titles of all books and their authors:

Copy

SELECT ?title ?author
WHERE {
  ?book ex:title ?title ;
        ex:author ?author .
}

Here, we are using variables "?title" and "?author" in the SELECT clause to indicate that we want to retrieve the values of the "title" and "author" properties for each book. In the WHERE clause, we are using RDF triples to specify the patterns to be matched - in this case, we want to match any resource that has a "title" property and an "author" property.

When we execute this query, we will get back a result set that includes the titles and authors of all books in the RDF graph:

Copy

title                   | author
------------------------|--------
"The Great Gatsby"      | ex:fitzgerald
"To Kill a Mockingbird" | ex:lee

SPARQL is a powerful tool for querying RDF data and extracting information from it. It allows for complex queriesthat can retrieve information from multiple sources and combine it in interesting ways, making it a key technology for the Semantic Web.

Can SPARQL query data from multiple RDF graphs?

Yes, SPARQL can query data from multiple RDF graphs. In SPARQL, a graph is identified by its URI, and data can be queried from multiple graphsusing the GRAPH keyword.

The GRAPH keyword is used to specify the graph that should be queried. For example, let's say we have two RDF graphsthat describe books and their authors, and we want to query both graphs to retrieve the titles of all books and their authors:

less Copy

@prefix ex1: <http://example.com/books/> .
@prefix ex2: <http://example.com/authors/> .
ex1:book1 ex1:title "The Great Gatsby" ;
         ex1:author ex2:fitzgerald .

ex1:book2 ex1:title "To Kill a Mockingbird" ;
         ex1:author ex2:lee .

ex2:fitzgerald ex2:name "F. Scott Fitzgerald" .
ex2:lee ex2:name "Harper Lee" .

We can use SPARQL to query both graphs and retrieve the titles of all books and their authors:

Copy

SELECT ?title ?authorName
WHERE {
  GRAPH <http://example.com/books> {
    ?book ex1:title ?title ;
          ex1:author ?author .
  }
  GRAPH <http://example.com/authors> {
    ?author ex2:name ?authorName .
  }
}

Here, we are using the GRAPH keyword to specify the two graphs that we want to query. In the WHERE clause, we are using RDF triplesto specify the patterns to be matched in each graph. Note that we are using the variable "?author" to link the book and author graphs together.

When we execute this query, we will get back a result set that includes the titles of all books and the names of their authors:

Copy

title                   | authorName
------------------------|--------------
"The Great Gatsby"      | "F. Scott Fitzgerald"
"To Kill a Mockingbird" | "Harper Lee"

By using the GRAPH keyword, we can query data from multiple RDF graphs and combine the results in interesting ways. This allows for the creation of more complex and powerful applications that can work with data from multiple sources.

Sure! In addition to the GRAPH keyword, SPARQL provides several other features for querying data from multiple RDF graphs.

One such feature is the ability to use UNION to combine the results of two or more queries. UNION allows us to retrieve data from multiple graphsthat have a similar structure, and combine the results into a single result set.

For example, let's say we have two RDF graphs that describe books and their authors, and we want to retrieve the titles of all books and the names of their authors, regardless of which graph they are in:

less Copy

@prefix ex1: <http://example.com/books/> .
@prefix ex2: <http://example.com/authors/> .
ex1:book1 ex1:title "The Great Gatsby" ;
         ex1:author ex2:fitzgerald .

ex1:book2 ex1:title "To Kill a Mockingbird" ;
         ex1:author ex2:lee .

ex2:fitzgerald ex2:name "F. Scott Fitzgerald" .
ex2:lee ex2:name "Harper Lee" .

We can use SPARQL to query both graphs and combine the results using UNION:

ruby Copy

SELECT ?title ?authorName
WHERE {
  {
    GRAPH <http://example.com/books> {
      ?book ex1:title ?title ;
            ex1:author ?author .
    }
    GRAPH <http://example.com/authors> {
      ?author ex2:name ?authorName .
    }
  } UNION {
    GRAPH <http://example.com/authors> {
      ?author ex2:name ?authorName .
    }
    GRAPH <http://example.com/books> {
      ?book ex1:title ?title ;
            ex1:author ?author .
    }
  }
}

Here, we are using UNION to combine two queries - one that retrieves the titles of books and the names of their authors, and another that retrieves the names of authors and the titles of their books. By using UNION, we can combine the results of these two queries into a single result set.

When we execute this query, we will get back a result set that includes the titles of all books and the names of all authors:

Copy

title                   | authorName
------------------------|--------------
"The Great Gatsby"      | "F. Scott Fitzgerald"
"To Kill a Mockingbird" | "Harper Lee"

Overall, SPARQL provides several powerful features for querying data from multiple RDF graphs, including the ability to use UNION and the GRAPH keyword. These features allow us to work with complex and diverse data sourcesand build applications that can combine data from multiple sources.


POE GPT4 :

GPT-4 - Poe

https://poe.com/GPT-4

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

RESPONSE :

triples = [ ("Robert", "lives", "nearby"), ("Robert", "was walking", "his dog"), ("Ram", "eating", "food"), ("Ram", "is singing", None) ]


rrtucci commented 1 year ago

Excellent work, Jan. Very instructive and fascinating. I find it so amazing how good AI is today. As far as I'm concerned, the Turing test has been passed already.

On Mon, May 15, 2023 at 3:49 PM Jan Bours @.***> wrote:

Up till now I have no full proof, convincing examples of RDF extraction from texts with FREE Open Source LLMs. But my search goes on.

What is really convincing, are the results of RDF extraction from texts with LLMs such as ChatGPT and GPT4 ! But there are LLMs that are able to perform RDF extraction from texts, here I show my results on a number of them:

POE SAGE:

https://poe.com/Sage

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

RESPONSE:

I assume you want me to extract the subject-predicate-object triples from the given text and represent them as RDF triples. Here's the list of triples:

triples = [ ('Robert', 'lives nearby', ' '), ('Robert', 'walking', 'dog'), ('Ram', 'eating', 'food'), ('Ram', 'singing', '') ]

Note that in the first triple, the object is an empty string because the sentence doesn't provide any information about what Robert's location is nearby.

POE CLAUDE+ :

https://poe.com/Claude%2B

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

RESPONSE :

triples = [ ("Robert", "lives nearby", None), ("Robert", "was walking", "his dog"), ("Ram", "is", "singing"), ("Ram", "eating", "food")]

POE CLAUDE+ :

https://poe.com/Claude-instant

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

RESPONSE :

triples = [ ('Robert', 'lives near','walking his dog'), ('Robert','walking his dog', ''), ('While eating food', 'Ram','is singing') ]

POE DRAGONFLY :

https://poe.com/Dragonfly

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

RESPONSE :

triples = [("Robert", "lives nearby", True), ("Robert", "was walking", "dog"), ("Ram", "was eating", "food"), ("Ram", "was singing", True)]

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

POE DRAGONFLY : https://poe.com/s/8UsuBs1jNgbBYQXXGjKa

Can you explain how the AI chatbot generates RDF triples?

Sure! An AI chatbotcan generate RDF triplesby analyzing natural language and extracting subject-predicate-object relationships from the text. For example, in the sentence "Robert, who lives nearby, was walking his dog," the subject is "Robert," the predicate is "lives nearby" and the object is "dog." The AI chatbot then stores this information as an RDF triplein the form of ("Robert", "lives nearby", "dog").

Can the AI chatbot generate RDF triples from any type of text?

Yes, the AI chatbot can generate RDF triplesfrom any type of text, as long as it is in natural language. The AI chatbotcan analyze the text and extract the subject-predicate-object relationships from the text, and store them as RDF triples.

How does the AI chatbot handle text with complex sentence structures?

The AI chatbot is able to handle complex sentence structuresby analyzing the text and extracting the subject-predicate-object relationships from the text. For example, if the text contains a sentence with multiple clauses, the AI chatbotwill analyze each clause and extract the subject-predicate-object relationships from each clause. The AI chatbot will then store this information as an RDF triplein the form of (subject, predicate, object).

How does the AI chatbot determine the subject-predicate-object relationships in a sentence?

The AI chatbot uses natural language processing techniques to analyze the text and identify subject-predicate-object relationships. For example, the AI chatbotwill analyze the syntax of the sentence and look for nouns, verbs, and other grammatical structuresto determine the subject, predicate, and object of the sentence.

How does the AI chatbot handle sentences with multiple subjects or objects?

The AI chatbot is able to handle sentences with multiple subjects or objects by extracting the subject-predicate-object relationships from each subject or object. For example, if the sentence contains multiple subjects or objects, the AI chatbotwill analyze each subject or object and extract the subject-predicate-object relationships from each one. The AI chatbot will then store this information as separate RDF triplesin the form of (subject, predicate, object).

Does the AI chatbot use any specific algorithms to extract subject-predicate-object relationships?

Yes, the AI chatbot uses natural language processing techniques, such as part-of-speech tagging, dependency parsing, and semantic analysis, to identify subject-predicate-object relationships in a sentence. These techniques help the AI chatbotto accurately identify the subject, predicate, and object of the sentence.

How does the AI chatbot use semantic analysis to identify subject-predicate-object relationships?

The AI chatbot uses semantic analysisto identify subject-predicate-object relationships by analyzing the meaning of the words in a sentence. This helps the AI chatbotto accurately identify the subject, predicate, and object of the sentence, even if the sentence contains multiple subjects or objects.

What are some challenges the AI chatbot faces when identifying subject-predicate-object relationships?

The AI chatbot faces several challenges when identifying subject-predicate-object relationships, such as ambiguity in the text, lack of context, and complex sentence structures. The AI chatbotmust be able to accurately identify the subject, predicate, and object of the sentence in order to generate the correct RDF triples. Additionally, the AI chatbot must be able to handle ambiguity and lack of context in order to accurately identify subject-predicate-object relationships.

Can you give an example of a complex sentence structure that the AI chatbot might struggle with?

Yes, the AI chatbot might struggle with a sentence such as "John, who was walking his dog, saw Mary, who was eating an apple." In this sentence, there are two subjects and two objects, and the AI chatbotmust be able to accurately identify the subject-predicate-object relationships in order to generate the correct RDF triples.

The AI chatbot must be able to accurately identify the subject, predicate, and object of the sentence in order to generate the correct RDF triples. This can be a challenge, especially with complex sentencessuch as this one. The AI chatbotmust be able to analyze the syntax of the sentence and look for nouns, verbs, and other grammatical structuresin order to accurately identify the subject, predicate, and object of the sentence. Additionally, the AI chatbot must be able to handle ambiguity and lack of context in order to accurately identify subject-predicate-object relationships.

What are some techniques that the AI chatbot might use to accurately identify the subject, predicate, and object of a sentence?

The AI chatbot might use several techniques to accurately identify the subject, predicate, and object of a sentence, such as natural language processing techniques, such as part-of-speech tagging, dependency parsing, and semantic analysis. Additionally, the AI chatbotmight use machine learning algorithmsto identify subject-predicate-object relationships in a sentence. These techniques help the AI chatbot to accurately identify the subject, predicate, and object of the sentence, even if the sentence contains multiple subjects or objects.

What are some examples of machine learning algorithms that can identify subject-predicate-object relationships?

Some examples of machine learning algorithmsthat can identify subject-predicate-object relationships include supervised machine learning algorithms, such as support vector machinesand decision trees, and unsupervised machine learningalgorithms, such as clustering and latent semanticanalysis. These algorithms can help the AI chatbotto accurately identify the subject, predicate, and object of the sentence, even if the sentence contains multiple subjects or objects.

Can you explain how clustering algorithms identify subject-predicate-object relationships? Clustering algorithms identify subject-predicate-object relationships by analyzing the text and grouping related words together. For example, if the text contains the words "John," "walk," and "dog," the clustering algorithmwill group these words together and identify the subject-predicate-object relationship of "John was walking his dog." The AI chatbotwill then store this information as an RDF triplein the form of ("John", "was walking", "dog").

POE NeevaAI :

https://poe.com/NeevaAI

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

RESPONSE :

I couldn't find an exact answer, but here are some search results that might help:

——-

stackoverflow.com https://stackoverflow.com/questions/21012327/how-to-convert-text-file-document-into-rdf-triples

  • How to convert Text file/document into RDF triples?

——-

arxiv.org https://arxiv.org/pdf/%5B1906.01965%5D(tel:+31190601965).pdf

  • Triple-to-Text: Converting RDF Triples into High-Quality Natural ...

ABSTRACT Knowledge base is one of the main forms to represent informa- tion in a structured way. A knowledge base typically consists of Resource Description Frameworks (RDF) triples which describe the entities and their relations. Generating natural language description of the knowledge base is an important task in NLP, which has been formulated as a conditional language generation task and tackled using the sequence-to-sequence framework. Current works mostly train the language models by maximum likelihood estimation, which tends to generate lousy sentences. In this paper, we argue that such a problem of maximum likelihood estimation is intrinsic, which is generally irrevocable via changing network structures. Accordingly, we propose a novel Triple-to-Text (T2T) framework, which approximately optimizes the inverse Kullback- Leibler (KL) divergence between the distributions of the real and generated sentences. Due to the nature that inverse KL imposes large penalty on fake-looking samples, the proposed method can significantly reduce the probability of generating low-quality sentences. Our experiments on three real-world datasets demonstrate that T2T can generate higher-quality sentences and outperform baseline models in several evaluation metrics.

CCS CONCEPTS • Computing methodologies → Natural language generation;

KEYWORDS Natural Language Generation, Sequence to Sequence Generation, Knowledge Bases

Triple-to-Text: Converting RDF Triples into High-Quality Natural Languages via Optimizing an Inverse KL Divergence https://arxiv.org/pdf/1906.01965.pdf

——-

yassine-hamoudi.github.io https://yassine-hamoudi.github.io/files/other/RDFTriples.pdf - Extracting RDF triples using the Stanford Parser

https://yassine-hamoudi.github.io/files/other/RDFTriples.pdf

——-

Explore more at neeva.com https://neeva.com/search?q=Split+the+text+into+RDF+triples.+State+the+result+as+%22triples+%3D+%22+followed+by+a+Python+list+of+tuples.%0A%0A%22Robert%2C+who+lives+nearby%2C+was+walking+his+dog.+While+eating+food+Ram+is+singing%22

Projet Pensées Profondes https://projetpp.github.io/

ProjetPP/PPP-QuestionParsing-ML-Standalone https://github.com/ProjetPP/PPP-QuestionParsing-ML-Standalone

POE https://poe.com/PromptGenius :

PromptGenius

Operated by

@LuipY https://poe.com/LuipY The bot can enhance simple prompts, and I recommend using it by saying “My prompt is "…".” You can use this before you use GPT-4, so you can get the most out of it.

This bot uses a model that is powered by OpenAI. Learn more https://poe.com/about Give me the your prompt so I can improve it.

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

RESPONSE :

Possible answers to the prompt:

1.

[("Robert", "lives nearby", ""), ("Robert", "was walking", "his dog"), ("Ram", "is singing", "while eating food")] 2.

Convert the given text into RDF triples:

triples = [("Robert", "lives nearby", ""), ("Robert", "was walking", "his dog"), ("Ram", "is singing", "while eating food")] 3.

Questions:

a) What is the context or purpose of this task?

b) Do you have any specific requirements or constraints for the RDF triples format?

c) Is there any additional information you would like to include in the triples?


POE CHATGPT :

https://poe.com/ChatGPT

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

RESPONSE :

https://poe.com/s/G2jaPIyA48oHlrXjy4cr

I apologize, but the given text still cannot be directly expressed as RDF triplesas it lacks any clear semantic structure. However, I can provide an example of how the text could be modified to create RDF triples: Original text: "Robert, who lives nearby, was walking his dog. While eating food Ram is singing" Modified text: "Robert walks his dog near his home. Ram sings while eating food." Here are the RDF triples that can be extracted from the modified text:

json Copy

triples = [ ("Robert", "walks", "his dog"), ("his dog", "is walked near", "his home"), ("Ram", "sings", "while eating food"), ]

Note that these triples are just one possible interpretation of the modified text, and other interpretations may yield different RDF triples. Additionally, the semantics of these triples may not accurately reflect the intended meaning of the original text.

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples. "Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

I apologize, but the given text still cannot be directly expressed as RDF triplesas it lacks any clear semantic structure. However, I can provide an example of how the text could be modified to create RDF triples: Modified text: "Robert lives nearby and walks his dog. Ram sings while eating food." Here are the RDF triples that can be extracted from the modified text: json Copy

triples = [ ("Robert", "lives nearby", ""), ("Robert", "walks", "his dog"), ("Ram", "sings", ""), ("Ram", "eats", "food"), ]

Note that these triples are just one possible interpretation of the modified text, and other interpretations may yield different RDF triples. Additionally, the semantics of these triples may not accurately reflect the intended meaning of the original text.

Context cleared

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples. "Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

Sorry, but as a language model, I don't have the capability to execute Python code directly. However, here are the RDF triplesfor the given text: json Copy

triples = [ ("Robert", "livesNearby", "nearby"), ("Robert", "isWalking", "hisDog"), ("Ram", "isEating", "food"), ("Ram", "isSinging", None) ]

Note that the RDF triples represent the relationships between the entities in the text. In this case, we have two entities, "Robert" and "Ram", and several relationships between them, such as "livesNearby", "isWalking", "isEating", and "isSinging". The third element in each tuple represents an object or value associated with the relationship. If there is no associated object or value (as in the case of Ram singing), it is represented as None.

Tell me more about RDF triples.

Sure! RDF triples are a way of representing data in the Resource Description Framework(RDF), which is a standard for describing resources on the web. RDF triplesconsist of three parts: a subject, a predicate, and an object.

The subject represents the resource being described, usually identified by a URI (Uniform Resource Identifier). The predicate represents a property or attribute of the resource, also identified by a URI. Finally, the object represents the value of the property or attribute.

For example, let's say we want to represent the fact that "John is a person". We can do this using an RDF triple:

Copy

http://example.com/john http://example.com/isA http://example.com/person

Here, the subject is the URI for John, the predicate is the URI for "is a", and the object is the URI for "person". Note that the URIs don't have to be real web addresses - they just need to be unique identifiers for the resources being described.

RDF triples can be used to represent many types of data, including metadata about web resources, scientific data, and more. They are a flexible and powerful way of representing structured data on the web.

RDF triples can be used to represent a wide variety of information, including metadata about web resources, scientific data, social network relationships, and more. They are often used in combination with other technologies such as SPARQL(a query languagefor RDF data) and OWL(a language for defining ontologies). In RDF, the subject and object of a triple can be any resource, including another triple. This allows for the creation of complex and nested data structures, which can be used to represent complex relationships between resources.

RDF triples can be serialized in several different formats, including RDF/XML, Turtle, and JSON-LD. These formats allow the RDF data to be easily exchanged and processed by different software systems.

One important use case for RDF is the creation of linked data, which involves creating a web of interconnected resources using RDF triples and URIs. Linked data allows for easier discovery and integration of information from different sources, making it easier to build applications that can combine data from multiple sources.

Overall, RDF triples provide a powerful and flexible way of representing structured data on the web, and are an important component of the Semantic Web.

How does SPARQL work with RDF data?

SPARQL (SPARQL Protocol and RDF QueryLanguage) is a query languagefor RDF data. It provides a way to query and retrieve data from RDF graphsusing a syntax similar to SQL.

SPARQL queries are made up of several parts, including a SELECT clausethat specifies the variables to be returned, a WHERE clause that specifies the patterns to be matched in the RDF graph, and an optional ORDER BY clausethat specifies the order in which the results should be returned.

The WHERE clause is the most important part of a SPARQL query, as it specifies the patterns to be matched in the RDF graph. These patterns are expressed using RDF triples, and can include variables that can match any resource in the graph.

For example, let's say we have the following RDF datathat describes some books and their authors:

stata Copy

@prefix ex: http://example.com/ . ex:book1 ex:title "The Great Gatsby" ; ex:author ex:fitzgerald . ex:book2 ex:title "To Kill a Mockingbird" ; ex:author ex:lee .

We can use SPARQL to query this data and retrieve the titles of all books and their authors:

Copy

SELECT ?title ?author WHERE { ?book ex:title ?title ; ex:author ?author . }

Here, we are using variables "?title" and "?author" in the SELECT clause to indicate that we want to retrieve the values of the "title" and "author" properties for each book. In the WHERE clause, we are using RDF triples to specify the patterns to be matched - in this case, we want to match any resource that has a "title" property and an "author" property.

When we execute this query, we will get back a result set that includes the titles and authors of all books in the RDF graph:

Copy

title author
"The Great Gatsby" ex:fitzgerald
"To Kill a Mockingbird" ex:lee

SPARQL is a powerful tool for querying RDF data and extracting information from it. It allows for complex queriesthat can retrieve information from multiple sources and combine it in interesting ways, making it a key technology for the Semantic Web.

Can SPARQL query data from multiple RDF graphs?

Yes, SPARQL can query data from multiple RDF graphs. In SPARQL, a graph is identified by its URI, and data can be queried from multiple graphsusing the GRAPH keyword.

The GRAPH keyword is used to specify the graph that should be queried. For example, let's say we have two RDF graphsthat describe books and their authors, and we want to query both graphs to retrieve the titles of all books and their authors:

less Copy

@prefix ex1: http://example.com/books/ . @prefix ex2: http://example.com/authors/ . ex1:book1 ex1:title "The Great Gatsby" ; ex1:author ex2:fitzgerald .

ex1:book2 ex1:title "To Kill a Mockingbird" ; ex1:author ex2:lee .

ex2:fitzgerald ex2:name "F. Scott Fitzgerald" . ex2:lee ex2:name "Harper Lee" .

We can use SPARQL to query both graphs and retrieve the titles of all books and their authors:

Copy

SELECT ?title ?authorName WHERE { GRAPH http://example.com/books { ?book ex1:title ?title ; ex1:author ?author . } GRAPH http://example.com/authors { ?author ex2:name ?authorName . } }

Here, we are using the GRAPH keyword to specify the two graphs that we want to query. In the WHERE clause, we are using RDF triplesto specify the patterns to be matched in each graph. Note that we are using the variable "?author" to link the book and author graphs together.

When we execute this query, we will get back a result set that includes the titles of all books and the names of their authors:

Copy

title authorName
"The Great Gatsby" "F. Scott Fitzgerald"
"To Kill a Mockingbird" "Harper Lee"

By using the GRAPH keyword, we can query data from multiple RDF graphs and combine the results in interesting ways. This allows for the creation of more complex and powerful applications that can work with data from multiple sources.

Sure! In addition to the GRAPH keyword, SPARQL provides several other features for querying data from multiple RDF graphs.

One such feature is the ability to use UNION to combine the results of two or more queries. UNION allows us to retrieve data from multiple graphsthat have a similar structure, and combine the results into a single result set.

For example, let's say we have two RDF graphs that describe books and their authors, and we want to retrieve the titles of all books and the names of their authors, regardless of which graph they are in:

less Copy

@prefix ex1: http://example.com/books/ . @prefix ex2: http://example.com/authors/ . ex1:book1 ex1:title "The Great Gatsby" ; ex1:author ex2:fitzgerald .

ex1:book2 ex1:title "To Kill a Mockingbird" ; ex1:author ex2:lee .

ex2:fitzgerald ex2:name "F. Scott Fitzgerald" . ex2:lee ex2:name "Harper Lee" .

We can use SPARQL to query both graphs and combine the results using UNION:

ruby Copy

SELECT ?title ?authorName WHERE { { GRAPH http://example.com/books { ?book ex1:title ?title ; ex1:author ?author . } GRAPH http://example.com/authors { ?author ex2:name ?authorName . } } UNION { GRAPH http://example.com/authors { ?author ex2:name ?authorName . } GRAPH http://example.com/books { ?book ex1:title ?title ; ex1:author ?author . } } }

Here, we are using UNION to combine two queries - one that retrieves the titles of books and the names of their authors, and another that retrieves the names of authors and the titles of their books. By using UNION, we can combine the results of these two queries into a single result set.

When we execute this query, we will get back a result set that includes the titles of all books and the names of all authors:

Copy

title authorName
"The Great Gatsby" "F. Scott Fitzgerald"
"To Kill a Mockingbird" "Harper Lee"

Overall, SPARQL provides several powerful features for querying data from multiple RDF graphs, including the ability to use UNION and the GRAPH keyword. These features allow us to work with complex and diverse data sourcesand build applications that can combine data from multiple sources.

POE GPT4 :

GPT-4 - Poe https://poe.com/GPT-4

https://poe.com/GPT-4

Split the text into RDF triples. State the result as "triples = " followed by a Python list of tuples.

"Robert, who lives nearby, was walking his dog. While eating food Ram is singing"

RESPONSE :

triples = [ ("Robert", "lives", "nearby"), ("Robert", "was walking", "his dog"), ("Ram", "eating", "food"), ("Ram", "is singing", None) ]

— Reply to this email directly, view it on GitHub https://github.com/rrtucci/mappa_mundi/issues/2#issuecomment-1548477495, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZWTYM34DFWJ3Q4XWGEKZLXGKCCPANCNFSM6AAAAAAXX5ZLHQ . You are receiving this because you were mentioned.Message ID: @.***>

jbdatascience commented 1 year ago

Now I hope to find a FREE Open Source variant of one of these LLMs that we can run locally on our PC, that also shows these capabilities!

jbdatascience commented 1 year ago

I am looking into the different steps of the MM algorithm.

There are 4 main steps: 1) Cleaning 2) Spell Checking 3) Simplifying 4) DagAtlas, using similarity

I think the following is true: Step 1, Cleaning: Works fine

Step 2, Spell Checking, does not give rise to problems, but could be upgraded with a method that takes the context into account as you mentioned. I wonder if modern SOTA LLMs could have a role to play here. But for now I think this not necessary at this moment.

Step 3, Simplifying, is a major concern. The whole MM algorithm is very sensitive to small changes in the output of this step as you have mentioned. A better way of doing this would be the use of ChatGPT / GPT4, but we have not found a affordable way of doing this yet, although we know that if we could, it would be a major step forward.

Concerning this problem, I am thinking the following. Because in the next step (step 4: DagAtlas) similarity between groups of words plays an important role in constructing the DAGs, maybe we could use another technique for calculating these similarities, namely by using vector embeddings of these workgroups ! It is something that you yourself already were contemplating as you mentioned earlier.

So my vision is to transform the MM algorithm to using the following steps:

1) Cleaning 2) Spell Checking 3) Simplifying, only the splitting into clauses part (or perhaps skipping this step 3 altogether) 4) Calculating vector embeddings for all the sentences or clauses coming from previous step (I believe SpaCy has methods for calculating embeddings, also there are GloVe embeddings and HuggingFace has methods for vector embeddings) 5) DagAtlas, using similarity of these vector embeddings e.g. by Cosine Similarity

Perhaps we loose that fragility that we encounter when we use the Simplification method we used, and we get more robust results by using vector embeddings for groups of words in our texts!

Also I have Python notebook prepared that works with these vector embeddings (which most of the time have large dimension, such as 512 or more than 1000) and also can visualize these vectors of a set of words or workgroups by using state of the art Dimensionality Reduction algorithms such as UMAP. You can see that in the Python notebook. Eyeballing that notebook you can see the similarities between these vectors!

Please share your thoughts on this. It is worth trying I should say!

Hello_World_Meet_Language_AI_COLAB_CPU_PYTHON3_tvscitechtalk.zip

rrtucci commented 1 year ago

The spell checking is almost unnecessary if we do the simplification with LLMs because LLMs are amazingly resilent to misspellings and missing commas. The old method of simplification was very sensitive to those types of mistakes.

Yes. The old method of calculating similarities should be replaced by the sentence transformers that you discovered earlier at HuggingFace and told me about. These are much faster than the WordNet or WordVec methods of calculating sentence similarity. Also more precise. The problem with using WordVec to define a metric for sentence similarity is that it represents sentences by a center of mass of the words in the sentence. This is a terrible definition. I believe the new methods actually consider a vector space of simple sentences rather than a vector space of single words.

jbdatascience commented 1 year ago

So we should be using vector embeddings, do you agree?

rrtucci commented 1 year ago

Definitely.

jbdatascience commented 1 year ago

Then we have to decide which embeddings to use. We could try a number of them:

Getting Started With Embeddings

https://huggingface.co/blog/getting-started-with-embeddings

Notebook Companion: Embedding-as-a-Service.ipynb - Colaboratory

https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/80_getting_started_with_embeddings.ipynb

Models - Hugging Face, Filter: Sentence Similarity

https://huggingface.co/models?other=Sentence%20Similarity

Embeddings - OpenAI API

https://platform.openai.com/docs/guides/embeddings/what-are-embeddings

Models - OpenAI API

https://platform.openai.com/docs/models/embeddings

Embeddings Embeddings are a numerical representation of text that can be used to measure the relateness between two pieces of text. Our second generation embedding model, text-embedding-ada-002 is a designed to replace the previous 16 first-generation embedding models at a fraction of the cost. Embeddings are useful for search, clustering, recommendations, anomaly detection, and classification tasks. You can read more about our latest embedding model in the announcement blog post.

Which distance function should I use? We recommend cosine similarity. The choice of distance function typically doesn’t matter much.

OpenAI embeddings are normalized to length 1, which means that:

Cosine similarity can be computed slightly faster using just a dot product Cosine similarity and Euclidean distance will result in the identical rankings

https://github.com/Hironsan/awesome-embedding-models

we have to search which ones are available and useful for us.

rrtucci commented 1 year ago

Okay.

We may have a new collaborator in the near future. Someone that I wrote to asking for help with sentence simplification has volunteered to help. Not a sure thing yet, so I can't say more yet.

jbdatascience commented 1 year ago

OK, but we can already start of with experimenting with word embeddings of course

rrtucci commented 1 year ago

Tonight I noticed that ChatGPT and Bard can give quite different answers for the RDF triples, even if one starts a fresh chat each time. So I asked ChatGPT the following question:

Does ChatGPT always give the same response every time, for the same prompt, if I start a fresh chat each time? Is there a random seed that can be fixed at the beginning of a chat?

And got the following response:

No, ChatGPT does not always give the same response for the same prompt if you start a fresh chat each time. While the model's behavior is generally consistent, it can still produce slightly different responses due to the inherent randomness in its architecture.

Currently, there isn't a way to fix a random seed at the beginning of a chat with ChatGPT. The model's responses are influenced by various factors, including the specific input provided, the context of the conversation, and internal stochasticity. Therefore, even with the same prompt, the responses may still vary.

However, OpenAI has introduced the concept of "system" messages, which are messages prefixed with "system: " and can provide high-level instructions or set the behavior of the model. By using system messages effectively, you can guide the model's responses in a more consistent manner.