gusye1234 / nano-graphrag

A simple, easy-to-hack GraphRAG implementation
MIT License
1.28k stars 123 forks source link

Add DSPy for entity extraction #27

Closed NumberChiffre closed 1 month ago

NumberChiffre commented 2 months ago

Description

The goal is to improve the entity extraction step of nano-graphrag by leveraging from DSPy:

TODOs

Challenges

Results

With dspy:

...
DEBUG:nano-graphrag:GraphRAG init with param:

  working_dir = ./nano_graphrag_cache_using_dspy_entity_extraction,
  enable_local = True,
  enable_naive_rag = False,
  chunk_token_size = 1200,
  chunk_overlap_token_size = 100,
  tiktoken_model_name = gpt-4o,
  entity_extract_max_gleaning = 1,
  entity_summary_to_max_tokens = 500,
  graph_cluster_algorithm = leiden,
  max_graph_cluster_size = 10,
  graph_cluster_seed = 3735928559,
  node_embedding_algorithm = node2vec,
  node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3},
  special_community_report_llm_kwargs = {'response_format': {'type': 'json_object'}},
  embedding_func = {'embedding_dim': 384, 'max_token_size': 256, 'func': <function local_embedding at 0x349f72050>},
  embedding_batch_num = 32,
  embedding_func_max_async = 16,
  query_better_than_threshold = 0.2,
  using_azure_openai = False,
  best_model_func = <function deepseepk_model_if_cache at 0x3124b8b80>,
  best_model_max_token_size = 32768,
  best_model_max_async = 10,
  cheap_model_func = <function deepseepk_model_if_cache at 0x3124b8b80>,
  cheap_model_max_token_size = 32768,
  cheap_model_max_async = 10,
  entity_extraction_func = <function extract_entities_dspy at 0x343e27eb0>,
  key_string_value_json_storage_cls = <class 'nano_graphrag._storage.JsonKVStorage'>,
  vector_db_storage_cls = <class 'nano_graphrag._storage.HNSWVectorStorage'>,
  vector_db_storage_cls_kwargs = {'max_elements': 1000000, 'ef_search': 200, 'M': 50},
  graph_storage_cls = <class 'nano_graphrag._storage.NetworkXStorage'>,
  enable_llm_cache = True,
  addon_params = {},
  convert_response_to_json_func = <function convert_response_to_json at 0x342bf6170>

INFO:nano-graphrag:Load KV full_docs with 0 data
INFO:nano-graphrag:Load KV text_chunks with 0 data
INFO:nano-graphrag:Load KV llm_response_cache with 0 data
INFO:nano-graphrag:Load KV community_reports with 0 data
INFO:nano-graphrag:Created new index for entities
INFO:nano-graphrag:[New Docs] inserting 1 docs
INFO:nano-graphrag:[New Chunks] inserting 42 chunks
INFO:nano-graphrag:[Entity Extraction]...
⠹ Processed 42 chunks, 668 entities(duplicated), 682 relations(duplicated)
DEBUG:nano-graphrag:Trigger summary: SCROOGE
DEBUG:nano-graphrag:Trigger summary: CHRISTMAS
DEBUG:nano-graphrag:Trigger summary: ('SCROOGE', 'CHRISTMAS')
INFO:nano-graphrag:Inserting 436 vectors to entities
...
INFO:nano-graphrag:[Community Report]...
INFO:nano-graphrag:Each level has communities: {0: 19, 1: 49, 2: 7}
INFO:nano-graphrag:Generating by levels: [2, 1, 0]
...
⠴ Processed 75 communities
INFO:nano-graphrag:Writing graph with 441 nodes, 561 edges
indexing time: 1155.397774219513
DEBUG:nano-graphrag:GraphRAG init with param:

  working_dir = ./nano_graphrag_cache_using_dspy_entity_extraction,
  enable_local = True,
  enable_naive_rag = False,
  chunk_token_size = 1200,
  chunk_overlap_token_size = 100,
  tiktoken_model_name = gpt-4o,
  entity_extract_max_gleaning = 1,
  entity_summary_to_max_tokens = 500,
  graph_cluster_algorithm = leiden,
  max_graph_cluster_size = 10,
  graph_cluster_seed = 3735928559,
  node_embedding_algorithm = node2vec,
  node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3},
  special_community_report_llm_kwargs = {'response_format': {'type': 'json_object'}},
  embedding_func = {'embedding_dim': 384, 'max_token_size': 256, 'func': <function local_embedding at 0x349f72050>},
  embedding_batch_num = 32,
  embedding_func_max_async = 16,
  query_better_than_threshold = 0.2,
  using_azure_openai = False,
  best_model_func = <function gpt_4o_mini_complete at 0x342c05fc0>,
  best_model_max_token_size = 8196,
  best_model_max_async = 4,
  cheap_model_func = <function gpt_4o_mini_complete at 0x342c05fc0>,
  cheap_model_max_token_size = 8196,
  cheap_model_max_async = 4,
  entity_extraction_func = <function extract_entities_dspy at 0x343e27eb0>,
  key_string_value_json_storage_cls = <class 'nano_graphrag._storage.JsonKVStorage'>,
  vector_db_storage_cls = <class 'nano_graphrag._storage.HNSWVectorStorage'>,
  vector_db_storage_cls_kwargs = {'max_elements': 1000000, 'ef_search': 200, 'M': 50},
  graph_storage_cls = <class 'nano_graphrag._storage.NetworkXStorage'>,
  enable_llm_cache = True,
  addon_params = {},
  convert_response_to_json_func = <function convert_response_to_json at 0x342bf6170>

INFO:nano-graphrag:Load KV full_docs with 1 data
INFO:nano-graphrag:Load KV text_chunks with 42 data
INFO:nano-graphrag:Load KV llm_response_cache with 78 data
INFO:nano-graphrag:Load KV community_reports with 75 data
INFO:nano-graphrag:Loaded graph from ./nano_graphrag_cache_using_dspy_entity_extraction/graph_chunk_entity_relation.graphml with 441 nodes, 561 edges
INFO:nano-graphrag:Loaded existing index for entities with 436 elements
INFO:nano-graphrag:Revtrieved 75 communities
INFO:nano-graphrag:Grouping to 3 groups for global search
...
# Top Themes in "A Christmas Carol"

Charles Dickens' "A Christmas Carol" encompasses several prominent themes that play a critical role in conveying its moral and social messages. Below is a synthesis of key themes identified by multiple analysts, highlighting their relevance in the context of the narrative.

## Transformation and Redemption

At the heart of the story lies the theme of transformation and redemption, primarily illustrated through the character of Ebenezer Scrooge. Scrooge undergoes a profound metamorphosis from a miserly, cold-hearted individual to someone who embraces kindness and compassion. Guided by the visits from the three spirits, his journey underscores the notion that change is possible, reflecting the capacity for goodness inherent in every person. This transformation serves as a powerful testament to the human ability to lose oneself and rediscover joy and connection through compassion.

## Importance of Family and Community

The narrative richly highlights the significance of family and community, particularly against the backdrop of Scrooge's initial isolation. The contrasting warmth and affection of the Cratchit family and his nephew Fred emphasize the value of human relationships. Dickens illustrates how coming together during the festive season can foster joy and support, presenting a celebration of togetherness that significantly enriches the human experience.

## Compassion and Social Responsibility

Compassion emerges as another vital theme, depicted through Scrooge's evolving interactions with others, especially the Cratchit family and the vulnerable Tiny Tim. The story advocates for understanding and caring for those who are less fortunate, suggesting that it may lead to personal and societal growth. Additionally, a critique of social responsibility is evident, particularly through the portrayal of the Portly Gentlemen collecting donations. Scrooge’s initial reluctance to contribute juxtaposes the community’s generosity, serving as a reminder of societal duty towards the less privileged and the importance of charity.

## Nostalgia and Reflection

Nostalgia plays a poignant role, with Scrooge reflecting on happier moments from his past that contribute to his understanding of joy and fulfillment. These recollections establish a connection to the values of kindness and community, evoking a bittersweet reminder of what he has lost. This reflective aspect is crucial as it prompts attention to personal regrets and the significance of leaving a positive legacy.

## Joy vs. Despair

Lastly, the dichotomy between joy and despair is explored through the contrasts present in the story. The spirit of Christmas represents joy, kindness, and generosity, while Scrooge’s initial disposition embodies negativity and isolation. This struggle between the two forces emphasizes the story’s call to embrace the positive attributes of life, ultimately suggesting that love and goodwill prevail over sorrow and judgment.

## Conclusion

In conclusion, "A Christmas Carol" presents a rich tapestry of themes such as transformation, the importance of family and community, compassion and social responsibility, nostalgia, and the interplay of joy and despair. Together, these themes invite readers to reflect on their values and relationships, showcasing the enduring power of redemption, compassion, and the spirit of Christmas.
...
# Themes in "A Christmas Carol"

Charles Dickens' "A Christmas Carol" explores several profound themes that resonate deeply with readers. The transformative journey of Ebenezer Scrooge from a miser to a compassionate individual serves as the backdrop for these themes, highlighting the importance of both personal growth and social awareness.

## Redemption

Redemption is perhaps the most significant theme in the story. Scrooge’s character arc illustrates the possibility of change and personal growth. Initially depicted as a cold-hearted miser, Scrooge undergoes a profound transformation facilitated by the visits from supernatural spirits. The Ghost of Jacob Marley warns him of the dire consequences of his avarice, which prompts Scrooge to reflect on his past, present, and the potential future he faces if he does not change. This theme emphasizes the belief that it is never too late to alter one’s path and embrace kindness and compassion.

## Avarice and Generosity

Avarice, or excessive greed, serves as a key driving force behind Scrooge's initial character. His view of the world, particularly around Christmas, is tainted by his focus on money and his disdain for generosity. However, through the transformative experiences with the Ghosts of Christmas Past, Present, and Yet to Come, Scrooge learns that generosity brings joy and connects people. The contrast between his earlier miserly nature and his eventual generosity—demonstrated through his actions toward the Cratchit family, particularly Tiny Tim—underscores the importance of community, charity, and human connection.

## The Spirit of Christmas

The theme of the spirit of Christmas is central to the narrative. Initially, Scrooge views Christmas as a burden, reflecting his isolation and lack of joy. However, as he engages with the spirits and witnesses the warmth and kindness associated with Christmas, he comes to appreciate the holiday's essence of love, family, and togetherness. The transformation of Scrooge highlights how Christmas serves as a time for reflection, renewal, and the importance of embracing the spirit of giving and compassion.

## Family and Connection

Family and relationships play a vital role in Scrooge's transformation. The warmth and love embodied by the Cratchit family serve as a stark contrast to Scrooge's lonely existence. Through his nephew Fred's persistent attempts to reach out, Scrooge learns the importance of familial connections and the joy they can bring. The narrative suggests that meaningful relationships, filled with laughter and support, are integral to a fulfilled life, encouraging readers to cherish their own families.

## Social Critique

Dickens also critiques societal attitudes toward the poor and the injustices prevalent in Victorian society. Scrooge’s initial lack of empathy for those less fortunate, evidenced by his views on the Poor Law and his treatment of Bob Cratchit, reflects the broader social indifference of the time. The story calls for compassion towards the marginalized and emphasizes that society thrives when its members care for one another. In portraying Tiny Tim's vulnerability, Dickens advocates for social responsibility and kindness.

## Conclusion

In summary, "A Christmas Carol" intricately weaves themes of redemption, avarice and generosity, the spirit of Christmas, family and connection, and social critique into its narrative. Through Scrooge’s journey, Dickens illuminates the transformative power of kindness, the importance of human relationships, and the necessity of empathy in a world often dominated by greed and indifference. These timeless themes continue to resonate, encouraging readers to reflect on their own lives and relationships during the festive season.

Without dspy:

...
DEBUG:nano-graphrag:GraphRAG init with param:

  working_dir = ./nano_graphrag_cache_using_hnsw_as_vectorDB,
  enable_local = True,
  enable_naive_rag = False,
  chunk_token_size = 1200,
  chunk_overlap_token_size = 100,
  tiktoken_model_name = gpt-4o,
  entity_extract_max_gleaning = 1,
  entity_summary_to_max_tokens = 500,
  graph_cluster_algorithm = leiden,
  max_graph_cluster_size = 10,
  graph_cluster_seed = 3735928559,
  node_embedding_algorithm = node2vec,
  node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3},
  special_community_report_llm_kwargs = {'response_format': {'type': 'json_object'}},
  embedding_func = {'embedding_dim': 384, 'max_token_size': 256, 'func': <function local_embedding at 0x16dc53d90>},
  embedding_batch_num = 32,
  embedding_func_max_async = 16,
  best_model_func = <function deepseepk_model_if_cache at 0x31cd843a0>,
  best_model_max_token_size = 32768,
  best_model_max_async = 10,
  cheap_model_func = <function deepseepk_model_if_cache at 0x31cd843a0>,
  cheap_model_max_token_size = 32768,
  cheap_model_max_async = 10,
  key_string_value_json_storage_cls = <class 'nano_graphrag._storage.JsonKVStorage'>,
  vector_db_storage_cls = <class 'nano_graphrag._storage.HNSWVectorStorage'>,
  vector_db_storage_cls_kwargs = {'max_elements': 1000000, 'ef_search': 200, 'M': 50},
  graph_storage_cls = <class 'nano_graphrag._storage.NetworkXStorage'>,
  enable_llm_cache = True,
  addon_params = {},
  convert_response_to_json_func = <function convert_response_to_json at 0x344412170>

INFO:nano-graphrag:Load KV full_docs with 0 data
INFO:nano-graphrag:Load KV text_chunks with 0 data
INFO:nano-graphrag:Load KV llm_response_cache with 0 data
INFO:nano-graphrag:Load KV community_reports with 0 data
INFO:nano-graphrag:Created new index for entities
INFO:nano-graphrag:[New Docs] inserting 1 docs
INFO:nano-graphrag:[New Chunks] inserting 42 chunks
INFO:nano-graphrag:[Entity Extraction]...
...
⠹ Processed 42 chunks, 821 entities(duplicated), 731 relations(duplicated)
DEBUG:nano-graphrag:Trigger summary: "SCROOGE"
INFO:nano-graphrag:Inserting 612 vectors to entities
...
INFO:nano-graphrag:[Community Report]...
INFO:nano-graphrag:Each level has communities: {0: 16, 1: 51, 2: 8}
INFO:nano-graphrag:Generating by levels: [2, 1, 0]
...
⠴ Processed 75 communities
INFO:nano-graphrag:Writing graph with 615 nodes, 659 edges
indexing time: 832.9665968418121
DEBUG:nano-graphrag:GraphRAG init with param:

  working_dir = ./nano_graphrag_cache_using_hnsw_as_vectorDB,
  enable_local = True,
  enable_naive_rag = False,
  chunk_token_size = 1200,
  chunk_overlap_token_size = 100,
  tiktoken_model_name = gpt-4o,
  entity_extract_max_gleaning = 1,
  entity_summary_to_max_tokens = 500,
  graph_cluster_algorithm = leiden,
  max_graph_cluster_size = 10,
  graph_cluster_seed = 3735928559,
  node_embedding_algorithm = node2vec,
  node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3},
  special_community_report_llm_kwargs = {'response_format': {'type': 'json_object'}},
  embedding_func = {'embedding_dim': 384, 'max_token_size': 256, 'func': <function local_embedding at 0x16dc53d90>},
  embedding_batch_num = 32,
  embedding_func_max_async = 16,
  best_model_func = <function gpt_4o_mini_complete at 0x344426050>,
  best_model_max_token_size = 8196,
  best_model_max_async = 4,
  cheap_model_func = <function gpt_4o_mini_complete at 0x344426050>,
  cheap_model_max_token_size = 8196,
  cheap_model_max_async = 4,
  key_string_value_json_storage_cls = <class 'nano_graphrag._storage.JsonKVStorage'>,
  vector_db_storage_cls = <class 'nano_graphrag._storage.HNSWVectorStorage'>,
  vector_db_storage_cls_kwargs = {'max_elements': 1000000, 'ef_search': 200, 'M': 50},
  graph_storage_cls = <class 'nano_graphrag._storage.NetworkXStorage'>,
  enable_llm_cache = True,
  addon_params = {},
  convert_response_to_json_func = <function convert_response_to_json at 0x344412170>

INFO:nano-graphrag:Load KV full_docs with 1 data
INFO:nano-graphrag:Load KV text_chunks with 42 data
INFO:nano-graphrag:Load KV llm_response_cache with 160 data
INFO:nano-graphrag:Load KV community_reports with 75 data
INFO:nano-graphrag:Loaded graph from ./nano_graphrag_cache_using_hnsw_as_vectorDB/graph_chunk_entity_relation.graphml with 615 nodes, 659 edges
INFO:nano-graphrag:Loaded existing index for entities with 612 elements
INFO:nano-graphrag:Revtrieved 75 communities
INFO:nano-graphrag:Grouping to 3 groups for global search
...
# Key Themes in the Story

The story is rich with various themes that significantly shape its narrative and moral lessons. Below are the primary themes identified through the analysis of multiple perspectives.

## Transformation and Redemption

A fundamental theme in the story revolves around **transformation and redemption**, exemplified by the character of Ebenezer Scrooge. Initially portrayed as a miserly and cold-hearted individual, Scrooge undergoes a profound transformation, becoming generous and kind-hearted, particularly as a result of his interactions with various spirits. This journey illustrates the potential for change and the importance of personal growth, emphasizing that individuals may redeem themselves regardless of their past behavior.

## Community and Familial Bonds

The significance of **community and familial bonds** emerges prominently throughout the narrative. Relationships among characters, such as those seen with Scrooge's nephew and the Cratchit family, highlight the joy and support that come from connections with others, especially during festive periods like Christmas. This theme underscores the idea that togetherness may foster happiness and fulfillment.

## Compassion and Charity

The concepts of **compassion and charity** are central to the story as well. The plight of the Cratchit family, particularly the struggles of Tiny Tim, evokes empathy and care from Scrooge following his transformation. The narrative calls attention to the importance of kindness towards others, reinforcing the notion that acts of charity can profoundly impact individuals and their communities.

## The Meaning of Christmas

Christmas itself serves as a critical symbol within the story, representing themes of joy, connection, and generosity. Initially, Scrooge's disdain for the holiday starkly contrasts with the spirit of Christmas that he ultimately comes to appreciate. Through his transformational journey, the story reveals the true meaning of Christmas and its potential to inspire change and foster goodwill.

## Wealth and Compassion

The story also addresses the contrast between **wealth and compassion**, emphasizing how Scrooge's initial indifference to the plight of the poor, like Tiny Tim, reflects a broader societal issue. This juxtaposition suggests that true wealth lies not in material possessions but in one's ability to empathize and respond to the needs of others.

## Reflection on Past

Scrooge's encounters with memories emphasize the importance of **reflection on one's past**. Through this self-examination facilitated by the spirits, he learns how past choices shape present behaviors, illustrating the crucial role that understanding one’s history plays in forming a more compassionate and fulfilling future.

## Hope Amidst Despair

Finally, the theme of **hope amid despair** is notably present, particularly illustrated through Tiny Tim's condition, which embodies the possibility for optimism even in the face of ongoing hardships. This theme reinforces the idea that, even during trying circumstances, hope may prevail.

# Conclusion

These themes come together to create a narrative that transcends its immediate story, offering insights about personal transformation, the significance of community, the nature of compassion, and the enduring spirit of hope. Each theme contributes to the overarching message that individuals may grow, change, and positively impact their lives and the lives of others, particularly in the context of love and connection during the holiday season.

...
INFO:nano-graphrag:Using 20 entites, 4 communities, 31 relations, 3 text units
...
# Top Themes in Scrooge's Transformation

The story surrounding Ebenezer Scrooge and his remarkable transformation during Christmas epitomizes several profound themes that resonate through its narrative. Here are the key themes highlighted within the context of Scrooge's journey:

## Redemption and Transformation

One of the most prominent themes is **redemption**. Throughout the story, Scrooge epitomizes the archetype of a misanthrope who is ultimately given a second chance. His encounters with the ghosts—Jacob Marley, the Ghosts of Christmas Past, Present, and Yet to Come—serve as catalysts for his reflection and eventual transformation. Initially, Scrooge is depicted as a miserly man, indifferent to the struggles of those around him. However, by witnessing the consequences of his actions and the joys of familial bonds, he learns the values of compassion and generosity. By the end of the narrative, his transformation is evident in both his demeanor and interactions, marking the power of change in human nature.

## The Spirit of Christmas

The **Spirit of Christmas** is another essential theme woven throughout the story. Scrooge’s initial disdain for the holiday contrasts sharply with the warmth and joy exhibited by characters such as his nephew Fred and the Cratchit family. The narrative highlights how this celebratory spirit fosters connections among people and emphasizes giving and goodwill towards others. After his transformation, Scrooge fully embraces these ideals, reflecting the story's underlying message about the importance of kindness, familial bonds, and community during the Christmas season.

## Isolation vs. Community

Scrooge's life in his gloomy Chambers symbolizes profound **isolation**. His frigid outlook on life alienates him from his family and community, revealing the dangers of living in self-imposed solitude. The story juxtaposes Scrooge’s lonely existence with scenes that depict warmth, camaraderie, and celebration found in the Cratchit home and the celebrations of Christmas. This contrast underscores the importance of community and the detrimental effects of isolation. As Scrooge learns to reconnect with his family and embrace social interactions, the narrative champions the significance of belonging and unity.

## The Consequences of Greed

The theme of **greed** surfaces in various forms through Scrooge's character. His love for wealth and disdain for charitable contributions mark him as a figure consumed by materialism. The spirits he encounters highlight the emptiness of a life dominated by the pursuit of wealth, emphasizing the idea that true richness comes from relationships and helping others rather than hoarding money. Scrooge’s eventual resolution to change his ways speaks to the moral lesson that wealth should be a means to benevolence rather than an end in itself.

## Compassion and the Human Condition

Finally, the theme of **compassion** is critical in journeying towards understanding the human condition. Scrooge's transformation is deeply rooted in his growing awareness of others' struggles, particularly that of Tiny Tim, and the importance of empathy. Through his experience with the spirits, Scrooge reflects on how individual actions can profoundly impact others' lives. His subsequent change of heart and newfound willingness to care for others signal a broader commentary on the value of empathy, charity, and understanding in fostering a cohesive and nurturing society.

In conclusion, the story of Scrooge’s transformation encapsulates essential themes of redemption, the spirit of Christmas, isolation versus community, the consequences of greed, and the importance of compassion. These themes work cohesively to convey a powerful message about the human capacity for change and the significance of nurturing relationships in enhancing the human experience.
codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 98.21429% with 3 lines in your changes missing coverage. Please review.

Project coverage is 94.56%. Comparing base (f1ae7fa) to head (f283212). Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
nano_graphrag/entity_extraction/extract.py 97.14% 2 Missing :warning:
nano_graphrag/_op.py 66.66% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #27 +/- ## ========================================== + Coverage 93.98% 94.56% +0.58% ========================================== Files 8 11 +3 Lines 1030 1195 +165 ========================================== + Hits 968 1130 +162 - Misses 62 65 +3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

gusye1234 commented 1 month ago

Yeah, I have not ideas why CodeCov didn't trigger again. I re-run this PR's workflow and it still have that Rate limit reached. Please upload with the Codecov repository upload token to resolve issue. error. Maybe I can merge first once you think this pr is ready and see what will happen.