run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.39k stars 4.68k forks source link

[Question]: MetadataReplacementPostProcessor:KeyError: 'window' #12693

Closed JasonFlyBeauty closed 2 months ago

JasonFlyBeauty commented 2 months ago

Question Validation

Question

def Node_Sentence_Window():

    node_parser = SentenceWindowNodeParser.from_defaults(
        window_size=3,
        window_metadata_key="window",
        original_text_metadata_key="original_text",
    )

    documents = SimpleDirectoryReader("/root/ragas/ragas/test/file").load_data(show_progress=True)

    llm = LangChainLLM(BaichuanLLM())

    embedding = LangchainEmbeddingsWrapper(CustomEmbedding())

    Settings.llm = llm

    Settings.embed_model = embedding

    Settings.node_parser = node_parser

    sentence_index = VectorStoreIndex.from_documents(
        # llm=llm,
        documents=documents,
        embed_model=embedding,
    )

    nodes = node_parser.get_nodes_from_documents(documents)

    for node in nodes[:5]: 
        print(node.metadata)

    postproc = MetadataReplacementPostProcessor(target_metadata_key="window")

    sentence_window_engine = sentence_index.as_query_engine(
        similarity_top_k=2, node_postprocessors=[postproc], llm=llm
    )

    question = "try to explain systematic search?"
    response = sentence_window_engine.query(question)
    print(f"response: {response}")
    print(f"len: {len(response.source_nodes)}")
    print("response.source_nodes[0].node.metadata",response.source_nodes[0].node.metadata)

    window = response.source_nodes[0].node.metadata["window"]
    sentence = response.source_nodes[0].node.metadata["original_text"]
    print("------------------")
    print(f"Window: {window}")
    print("------------------")
    print(f"Original Sentence: {sentence}")
Loading files: 100%|█████████████████████████████████| 1/1 [00:02<00:00,  2.83s/file]
{'window': 'A Survey of App Store Analysis\nfor Software Engineering\nWilliam Martin, Federica Sarro, Yue Jia, Yuanyuan Zhang, and Mark Harman\nAbstract— App Store Analysis studies information about applications obtained from app stores.  App stores provide a wealth of\ninformation derived from users that would not exist had the applications been distributed via previous software deployment methods.\n App Store Analysis combines this non-technical information with technical information to learn trends and behaviours within these\nforms of software repositories.  Findings from App Store Analysis have a direct and actionable impact on the software teams that\ndevelop software for app stores, and have led to techniques for requirements engineering, release planning, software design, security\nand testing. ', 'original_text': 'A Survey of App Store Analysis\nfor Software Engineering\nWilliam Martin, Federica Sarro, Yue Jia, Yuanyuan Zhang, and Mark Harman\nAbstract— App Store Analysis studies information about applications obtained from app stores. ', 'page_label': '1', 'file_name': 'A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_path': '/root/ragas/ragas/test/file/A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_type': 'application/pdf', 'file_size': 1229461, 'creation_date': '2024-04-09', 'last_modified_date': '2024-04-09'}
{'window': 'A Survey of App Store Analysis\nfor Software Engineering\nWilliam Martin, Federica Sarro, Yue Jia, Yuanyuan Zhang, and Mark Harman\nAbstract— App Store Analysis studies information about applications obtained from app stores.  App stores provide a wealth of\ninformation derived from users that would not exist had the applications been distributed via previous software deployment methods.\n App Store Analysis combines this non-technical information with technical information to learn trends and behaviours within these\nforms of software repositories.  Findings from App Store Analysis have a direct and actionable impact on the software teams that\ndevelop software for app stores, and have led to techniques for requirements engineering, release planning, software design, security\nand testing.  This survey describes and compares the areas of research that have been explored thus far, drawing out common\naspects, trends and directions future research should take to address open problems and challenges.\n', 'original_text': 'App stores provide a wealth of\ninformation derived from users that would not exist had the applications been distributed via previous software deployment methods.\n', 'page_label': '1', 'file_name': 'A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_path': '/root/ragas/ragas/test/file/A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_type': 'application/pdf', 'file_size': 1229461, 'creation_date': '2024-04-09', 'last_modified_date': '2024-04-09'}
{'window': 'A Survey of App Store Analysis\nfor Software Engineering\nWilliam Martin, Federica Sarro, Yue Jia, Yuanyuan Zhang, and Mark Harman\nAbstract— App Store Analysis studies information about applications obtained from app stores.  App stores provide a wealth of\ninformation derived from users that would not exist had the applications been distributed via previous software deployment methods.\n App Store Analysis combines this non-technical information with technical information to learn trends and behaviours within these\nforms of software repositories.  Findings from App Store Analysis have a direct and actionable impact on the software teams that\ndevelop software for app stores, and have led to techniques for requirements engineering, release planning, software design, security\nand testing.  This survey describes and compares the areas of research that have been explored thus far, drawing out common\naspects, trends and directions future research should take to address open problems and challenges.\n Index Terms— App store, analysis, mining, API, feature, release planning, requirements engineering, reviews, security, ecosystem\nÇ\n1I NTRODUCTION\nAPPstores are a recent phenomenon: Apple’s App Store\nand Google Play were launched in 2008, and since then\nboth have accumulated in excess of 1 million downloadable\nand rateable apps. ', 'original_text': 'App Store Analysis combines this non-technical information with technical information to learn trends and behaviours within these\nforms of software repositories. ', 'page_label': '1', 'file_name': 'A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_path': '/root/ragas/ragas/test/file/A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_type': 'application/pdf', 'file_size': 1229461, 'creation_date': '2024-04-09', 'last_modified_date': '2024-04-09'}
{'window': 'A Survey of App Store Analysis\nfor Software Engineering\nWilliam Martin, Federica Sarro, Yue Jia, Yuanyuan Zhang, and Mark Harman\nAbstract— App Store Analysis studies information about applications obtained from app stores.  App stores provide a wealth of\ninformation derived from users that would not exist had the applications been distributed via previous software deployment methods.\n App Store Analysis combines this non-technical information with technical information to learn trends and behaviours within these\nforms of software repositories.  Findings from App Store Analysis have a direct and actionable impact on the software teams that\ndevelop software for app stores, and have led to techniques for requirements engineering, release planning, software design, security\nand testing.  This survey describes and compares the areas of research that have been explored thus far, drawing out common\naspects, trends and directions future research should take to address open problems and challenges.\n Index Terms— App store, analysis, mining, API, feature, release planning, requirements engineering, reviews, security, ecosystem\nÇ\n1I NTRODUCTION\nAPPstores are a recent phenomenon: Apple’s App Store\nand Google Play were launched in 2008, and since then\nboth have accumulated in excess of 1 million downloadable\nand rateable apps.  Google announced that there were 1.4 bil-\nlion activated Android devices in September 2015 [32].\n', 'original_text': 'Findings from App Store Analysis have a direct and actionable impact on the software teams that\ndevelop software for app stores, and have led to techniques for requirements engineering, release planning, software design, security\nand testing. ', 'page_label': '1', 'file_name': 'A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_path': '/root/ragas/ragas/test/file/A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_type': 'application/pdf', 'file_size': 1229461, 'creation_date': '2024-04-09', 'last_modified_date': '2024-04-09'}
{'window': 'App stores provide a wealth of\ninformation derived from users that would not exist had the applications been distributed via previous software deployment methods.\n App Store Analysis combines this non-technical information with technical information to learn trends and behaviours within these\nforms of software repositories.  Findings from App Store Analysis have a direct and actionable impact on the software teams that\ndevelop software for app stores, and have led to techniques for requirements engineering, release planning, software design, security\nand testing.  This survey describes and compares the areas of research that have been explored thus far, drawing out common\naspects, trends and directions future research should take to address open problems and challenges.\n Index Terms— App store, analysis, mining, API, feature, release planning, requirements engineering, reviews, security, ecosystem\nÇ\n1I NTRODUCTION\nAPPstores are a recent phenomenon: Apple’s App Store\nand Google Play were launched in 2008, and since then\nboth have accumulated in excess of 1 million downloadable\nand rateable apps.  Google announced that there were 1.4 bil-\nlion activated Android devices in September 2015 [32].\n Mobile app stores are also extremely lucrative: the set of\nonline mobile app stores were projected to be worth a com-\nbined 25 billion USD in 2015 [152]. ', 'original_text': 'This survey describes and compares the areas of research that have been explored thus far, drawing out common\naspects, trends and directions future research should take to address open problems and challenges.\n', 'page_label': '1', 'file_name': 'A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_path': '/root/ragas/ragas/test/file/A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_type': 'application/pdf', 'file_size': 1229461, 'creation_date': '2024-04-09', 'last_modified_date': '2024-04-09'}
/root/miniconda3/envs/ragas/lib/python3.8/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The function `predict` was deprecated in LangChain 0.1.7 and will be removed in 0.2.0. Use invoke instead.
  warn_deprecated(
response: Systematic search is a methodical and organized approach to locating relevant information, particularly in academic research. It involves the use of specific keywords, phrases, or combinations of terms (known as search queries) to find relevant materials, such as articles, papers, or documents, within digital libraries or databases.

In the context provided, the researchers employed systematic search techniques to identify relevant papers for their survey on app store analysis for software engineering. They used various search queries, including both specific and general ones, across different paper repositories such as Google Scholar, IEEE, ACM, JSTOR, arXiv, and Scopus.

The researchers also mention using snowballing, which is a technique where references from already identified relevant papers are examined to find additional materials. This helps to ensure that the search is comprehensive and reduces the risk of missing important papers.

In summary, systematic search is a structured and targeted process designed to maximize the efficiency and effectiveness of locating relevant information for a particular research topic or area of interest.
len: 2
response.source_nodes[0].node.metadata {'page_label': '3', 'file_name': 'A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_path': '/root/ragas/ragas/test/file/A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_type': 'application/pdf', 'file_size': 1229461, 'creation_date': '2024-04-09', 'last_modified_date': '2024-04-09'}
Traceback (most recent call last):
  File "ragastest.py", line 819, in <module>
    main()
  File "ragastest.py", line 815, in main
    Node_Sentence_Window()
  File "ragastest.py", line 785, in Node_Sentence_Window
    window = response.source_nodes[0].node.metadata["window"]
KeyError: 'window'
dosubot[bot] commented 2 months ago

Hey @JasonFlyBeauty! Great to see you back with us. I hope all is well on your end. 😊

The KeyError: 'window' you're encountering indicates that the "window" key is missing from the metadata of the node returned by your sentence window engine query. This suggests that either the nodes in your sentence index do not include this key in their metadata, or it's being removed during the query process.

To troubleshoot, first verify the presence of the "window" key in the metadata of the nodes returned by your query and in your sentence index. You can do this by printing the metadata of these nodes. If the "window" key is missing, you'll need to ensure it's added to the metadata of each node in your sentence index or identify where it's being removed during the query process and adjust your code accordingly.

This approach will help you pinpoint the source of the issue and take the necessary steps to resolve it. If you need further assistance or have more questions, feel free to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
JasonFlyBeauty commented 2 months ago

@dosu-bot

class MetadataReplacementPostProcessor(BaseNodePostprocessor):
    target_metadata_key: str = Field(
        description="Target metadata key to replace node content with."
    )

    def __init__(self, target_metadata_key: str) -> None:
        super().__init__(target_metadata_key=target_metadata_key)

    @classmethod
    def class_name(cls) -> str:
        return "MetadataReplacementPostProcessor"

    def _postprocess_nodes(
        self,
        nodes: List[NodeWithScore],
        query_bundle: Optional[QueryBundle] = None,
    ) -> List[NodeWithScore]:
        for n in nodes:
            print("Before processing:", n.node.metadata)
            n.node.set_content(
                n.node.metadata.get(
                    self.target_metadata_key,
                    n.node.get_content(metadata_mode=MetadataMode.NONE),
                )
            )

        print(f"return nodes :{nodes}")
        return nodes
def Node_Sentence_Window():

    node_parser = SentenceWindowNodeParser.from_defaults(
        window_size=3,
        window_metadata_key="window",
        original_text_metadata_key="original_text",
    )

    documents = SimpleDirectoryReader("/root/ragas/ragas/test/file").load_data(show_progress=True)

    llm = LangChainLLM(BaichuanLLM())

    embedding = LangchainEmbeddingsWrapper(CustomEmbedding())

    Settings.llm = llm

    Settings.embed_model = embedding

    Settings.node_parser = node_parser

    sentence_index = VectorStoreIndex.from_documents(
        # llm=llm,
        documents=documents,
        embed_model=embedding,
    )

    nodes = node_parser.get_nodes_from_documents(documents)

    # for node in nodes[:5]: 
    #     print(node.metadata)
    print(nodes[0].metadata)

    postproc = MetadataReplacementPostProcessor(target_metadata_key="window")

    sentence_window_engine = sentence_index.as_query_engine(
        similarity_top_k=2, node_postprocessors=[postproc], llm=llm
    )

    question = "try to explain systematic search?"
    response = sentence_window_engine.query(question)
    print(f"response: {response}")
    print(f"len: {len(response.source_nodes)}")
    print("response.source_nodes[0].node.metadata",response.source_nodes[0].node.metadata)

    window = response.source_nodes[0].node.metadata["window"]
    sentence = response.source_nodes[0].node.metadata["original_text"]
    print("------------------")
    print(f"Window: {window}")
    print("------------------")
    print(f"Original Sentence: {sentence}")
Loading files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.30s/file]

{'window': 'A Survey of App Store Analysis\nfor Software Engineering\nWilliam Martin, Federica Sarro, Yue Jia, Yuanyuan Zhang, and Mark Harman\nAbstract— App Store Analysis studies information about applications obtained from app stores.  App stores provide a wealth of\ninformation derived from users that would not exist had the applications been distributed via previous software deployment methods.\n App Store Analysis combines this non-technical information with technical information to learn trends and behaviours within these\nforms of software repositories.  Findings from App Store Analysis have a direct and actionable impact on the software teams that\ndevelop software for app stores, and have led to techniques for requirements engineering, release planning, software design, security\nand testing. ', 'original_text': 'A Survey of App Store Analysis\nfor Software Engineering\nWilliam Martin, Federica Sarro, Yue Jia, Yuanyuan Zhang, and Mark Harman\nAbstract— App Store Analysis studies information about applications obtained from app stores. ', 'page_label': '1', 'file_name': 'A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_path': '/root/ragas/ragas/test/file/A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_type': 'application/pdf', 'file_size': 1229461, 'creation_date': '2024-04-09', 'last_modified_date': '2024-04-09'}

Before processing: {'page_label': '3', 'file_name': 'A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_path': '/root/ragas/ragas/test/file/A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_type': 'application/pdf', 'file_size': 1229461, 'creation_date': '2024-04-09', 'last_modified_date': '2024-04-09'}

Before processing: {'page_label': '15', 'file_name': 'A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_path': '/root/ragas/ragas/test/file/A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_type': 'application/pdf', 'file_size': 1229461, 'creation_date': '2024-04-09', 'last_modified_date': '2024-04-09'}

return nodes :[NodeWithScore(node=TextNode(id_='57e4121a-dd1d-409b-a301-38b860f6cff5', embedding=None, metadata={'page_label': '3', 'file_name': 'A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_path': '/root/ragas/ragas/test/file/A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_type': 'application/pdf', 'file_size': 1229461, 'creation_date': '2024-04-09', 'last_modified_date': '2024-04-09'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='8babc8ae-61c0-4274-a12a-efac0e5630ae', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '3', 'file_name': 'A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_path': '/root/ragas/ragas/test/file/A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_type': 'application/pdf', 'file_size': 1229461, 'creation_date': '2024-04-09', 'last_modified_date': '2024-04-09'}, hash='d62451b6e8592e69321f453995eddd86c08aedb7f8ae7dfa906807c2b1f04765'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='1ccbba0c-a8ae-4899-aacb-479fcb02d154', node_type=<ObjectType.TEXT: '1'>, metadata={'page_label': '3', 'file_name': 'A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_path': '/root/ragas/ragas/test/file/A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_type': 'application/pdf', 'file_size': 1229461, 'creation_date': '2024-04-09', 'last_modified_date': '2024-04-09'}, hash='a3345b2b53acd941d3e0b5e3478690f6d42969c971b9cae4821a2588d1bb0787'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='06bf231d-5794-4406-88eb-6b01eaf3f67c', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='519d60667e3fdefeaccede688951393a7555056ee6d571595b278c71d7b58e2e')}, text='Since, by our definition,\napp store analysis research uses collections of apps, this\nshould encompass much of the field. We also performed\nsnowballing, which further helps to mitigate the threat\nof potentially missing papers. However, the threat of\nmissing papers is a threat to the validity of any survey,\nincluding this one.TABLE 1\nSearch Query Results Indicating the Number of Hits Each Query Generates, the Number of These That Were Available\nto Be Inspected, the Number of Titles and Subsequent Abstracts and Paper Bodies that Were Accepted as Valid\nSpecific\nQueries“app\nstore\nanalysis”“app\nstore\nanalysis”\nAND mining“app store analysis”\nAND mining\nAND API“app store\nanalysis”\nAND mining\nAND feature“app\nstore analysis”“app\nstore analysis”\nAND mining“app\nstore analysis”\nAND mining\nAND API“app\nstore analysis”\nAND mining\nAND feature\nGoogle Scholar IEEE\nHits 35 17 9 13 3 40 13 13\nInspect 35 17 9 13 3 40 13 13\nTitle 15 13 8 12 3 8 8 8\nAbstract 13 13 8 12 3 7 4 4\nBody 12 13 8 12 3 5 4 4\nACM JSTOR\nHits 7 1,146 295 231 0 36 4 13\nInspect 7 1,146 295 231 0 36 4 13\nTitle 46 9 4 4 3 1 0 0 0 0\nAbstract 35 7 2 7 2 2 0 0 0 0\nBody 34 4 2 6 1 7 0 0 0 0\narXiv Scopus\nHits 0 81 28 10 1 128 21 1\nInspect 0 81 28 10 1 128 21 1\nTitle 0 4 1 0 1 128 21 1\nAbstract 04 1 0 0 1 3 6 0\nBody 04 1 0 0 1 1 4 0\nGeneral\nQueries“app store”\nAND analysis\nAND API“app store”\nAND analysis\nAPI AND mine“app store”\nAND analysis\nAND feature\nAND mine“app store\nanalysis”\nAND mining\nAND requirements“app store\nanalysis”\nAND mining\nAND release“app store\nanalysis”\nAND mining\nAND reviews“app\nstore analysis”\nAND mining\nAND security“app store\nanalysis” AND\nmining\nAND ecosystem\nGoogle Scholar\nHits 3,130 409 1040 12 9 15 9 9\nInspect 1,000 409 1,000 12 9 15 9 9\nTitle 87 35 37 12 9 14 8 9\nAbstract 61 23 33 12 9 14 8 9\nBody 52 21 32 12 9 14 8 9\nThe top boxes indicate more specific queries run in multiple paper repositories, and the lower boxes indicate the more general queries run only in Googl e Scholar.\nIn the case of Google Scholar, only the top 1,000 results were accessible to inspect at the time of search.MARTIN ET AL.: A SURVEY OF APP STORE ANALYSIS FOR SOFTWARE ENGINEERING 819', start_char_idx=1961, end_char_idx=4114, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.5296635663837159), NodeWithScore(node=TextNode(id_='85b47111-fa83-45d3-a724-30bb25d4dd85', embedding=None, metadata={'page_label': '15', 'file_name': 'A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_path': '/root/ragas/ragas/test/file/A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_type': 'application/pdf', 'file_size': 1229461, 'creation_date': '2024-04-09', 'last_modified_date': '2024-04-09'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='d41afb92-c61f-4b36-9802-04e0b53aeb59', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '15', 'file_name': 'A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_path': '/root/ragas/ragas/test/file/A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_type': 'application/pdf', 'file_size': 1229461, 'creation_date': '2024-04-09', 'last_modified_date': '2024-04-09'}, hash='d1b784d24f33607996678fc4a9f01e017ac256758abc731875f2ccaf1ced0b93'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='6c4b9137-4bec-4a28-b17d-81eacf89822e', node_type=<ObjectType.TEXT: '1'>, metadata={'page_label': '14', 'file_name': 'A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_path': '/root/ragas/ragas/test/file/A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_type': 'application/pdf', 'file_size': 1229461, 'creation_date': '2024-04-09', 'last_modified_date': '2024-04-09'}, hash='be63869fd11bc13df0e1543634045fedcdd3f186a976d4093994bb02a6541a9f'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='033bcd82-5df2-4f30-bf74-2ec447627397', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='3f1c3614b8314aff12b07b4c6d0876e0fb38660ecedd46c9231e2f1b0f3a875d')}, text='9.3 Requirements Engineering\nOh et al. [174] developed a review digest system, that they\ntested on 1,711,556 reviews mined from 24,000 Google Play\napps. They automatically categorised reviews into bug\nreports, functional requests and non-functional requests,\nand produced a digest featuring the most informative\nreviews in each category.\nIacob and Harrison [104] presented an automated system\n(MARA ) for extracting and analysing app reviews in order to\nidentify feature requests. The system is particularly useful\nbecause it offers a simple and intuitive approach to identify-\ning requests. One sixty-one apps and 3,279 reviews were\nused for manually training linguistic rules. 136,998 reviews\nwere used for the evaluation, which found that 23.3 percent\nof reviews contained feature requests.\nAs an extension to the MARA system they had previously\nintroduced [106], Iacob et al. [105] introduced a set of lin-\nguistic rules for identifying feature requests and bug reports\nin order to help facilitate app development. Wano and\nIio [241] analysed the textual content of 856 reviews from\n500 apps in the Japanese App Store, and found that the\nreview styles differed between apps in different categories.\nIn a large scale study, Eri /C19c et al. [61] studied the star ratings\nof 48 million reviews mined from 968 popular free and paid\nApple apps. They found that the reviews were mostly posi-\ntive, and that there were significant differences in the distri-\nbutions between categories, and also between free and paid\nsections. Free apps had more reviews but a lower mean rat-\ning, and higher standard deviation. Due to the higher num-\nbers of reviews for free apps, which might give an app\ncredibility, the authors argued that in-app purchasing reve-\nnue models were a good way to make money for develop-\ners, especially if used as a ‘teaser’ for a paid version.\nPark et al. [179] developed AppLDA, a topic model\ndesigned for use on app descriptions and user reviews, that\ndiscards review-only topics. This enables developers to\ninspect the reviews that discuss features present in the app\ndescriptions. The authors tested the system on 1,385,607\nreviews mined from 43,041 apps. Panichella et al. [178] pre-\nsented a system for automatically classifying user reviews\nbased on a predetermined taxonomy, in order to support\nsoftware maintenance and requirements evolution. They\nverified the system on a manually labelled truth set of 1,421\nsentences extracted from reviews, and achieved a precision\nof 0.85 and recall of 0.85, when training the system on lan-\nguage structure, content and sentiment features. Maalej and\nNabil [146] produced a classification method identifying\nbug reports and feature requests from user reviews. The\nauthors found that upwards of 70 percent precision and 80\npercent recall could be obtained using multiple binary clas-\nsifiers, as an alternative to a single multiclass classifier.\nThey also found that the commonly used NLP techniques,\nstopword removal and lemmatisation, could negatively\naffect the performance of this classification task.\nMoran et al. [168] proposed the FUSION system, that per-\nforms static and dynamic analysis on Android apps, in\norder to help users complete bug reports. The system\nfocuses on the steps to reproduce a bug, using dynamic\nanalysis to walk through Android system events. Khalid\net al. [119] argued that app store reviews can be used for\n“crowdsourcing” [150]. They argued that users areinadvertently performing crowdsourcing when they review\napps, solving the following problems: requests for potential\nfeatures, suggestions for developer action, recommenda-\ntions for other users, and issue reporting.\nThe requirements engineering review literature has used\nreviews to extract bug reports and feature requests, in\naddition to prioritising critical reviews.\n9.4 Sentiment\nThe works discussed in this section have incorporated senti-\nment in their study of reviews. Sentiment describes a user’s\nviews or opinions, typically as positive or negative in this\ncontent, and is extracted from reviews using ‘positive’ senti-\nment words such as ‘good,great,love’, or ‘negative’ senti-\nment words such as ‘bad,hate,terrible’.\nIn 2012 Goul et al.', start_char_idx=0, end_char_idx=4196, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.516537569669691)]
/root/miniconda3/envs/ragas/lib/python3.8/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The function `predict` was deprecated in LangChain 0.1.7 and will be removed in 0.2.0. Use invoke instead.
  warn_deprecated(

response: Systematic search is a methodical and organized approach to locating relevant information or documents for a specific research topic or question. In the context of academic research, systematic search involves using a combination of keywords, Boolean operators, and other search strategies to find articles, papers, and other sources that are relevant to the research topic.

This process often includes searching multiple databases and repositories, such as Google Scholar, IEEE, ACM, JSTOR, arXiv, and Scopus, to ensure that all relevant materials are identified. The search queries are designed to be as specific as possible to reduce the number of irrelevant results and increase the efficiency of the search process.

In the example provided, the researchers used specific queries such as "app store analysis" and "app store analysis" AND mining to narrow down the search results to only those articles that were directly related to their research topic. They also used general queries, such as "app store" AND analysis AND API, to cast a wider net and identify additional relevant materials.

The researchers also mention using snowballing, which is a technique for finding additional studies by examining the references cited in the initial set of identified papers. This helps to mitigate the threat of missing important papers that may not have been found through the initial search queries.

Overall, systematic search is an essential component of conducting a comprehensive literature review and ensuring the validity of the research findings.

len: 2

response.source_nodes[0].node.metadata {'page_label': '3', 'file_name': 'A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_path': '/root/ragas/ragas/test/file/A_Survey_of_App_Store_Analysis_for_Software_Engineering.pdf', 'file_type': 'application/pdf', 'file_size': 1229461, 'creation_date': '2024-04-09', 'last_modified_date': '2024-04-09'}
Traceback (most recent call last):
  File "ragastest.py", line 820, in <module>
    main()
  File "ragastest.py", line 816, in main
    Node_Sentence_Window()
  File "ragastest.py", line 786, in Node_Sentence_Window
    window = response.source_nodes[0].node.metadata["window"]
KeyError: 'window'
dosubot[bot] commented 2 months ago

The KeyError: 'window' you're encountering suggests that the "window" key is not present in the node's metadata when you attempt to access it. This issue could arise from several factors, including inconsistencies in how the metadata is set or accessed, or if nodes bypass certain processing steps. Here are steps to address and potentially resolve the issue:

  1. Ensure Consistent Processing: Confirm that all nodes pass through the SentenceWindowNodeParser processing, specifically through the method that sets the "window" metadata. This is crucial for the metadata to be present.

  2. Check Key Consistency: Make sure the key used to set and access the metadata is consistent. The key is "window" in your setup, so verify there are no discrepancies or typos in your code.

  3. Implement Debugging Output: Add debugging statements immediately after the metadata is set. This can help verify that the "window" metadata is being correctly assigned to each node.

  4. Safely Access Metadata: Use the .get() method to access the "window" metadata. This method avoids raising a KeyError by returning None or a specified default value if the key is not found, aiding in debugging.

  5. Review Node Lifecycle: Ensure that the nodes being queried are the same ones that had their metadata set. It's possible that nodes could be recreated or replaced without the necessary metadata, leading to the observed issue.

For a deeper dive into the node processing and metadata setting, you might want to review the SentenceWindowNodeParser class in the LlamaIndex codebase. This could provide further insights into how nodes are handled and how metadata is assigned.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
logan-markewich commented 2 months ago

@JasonFlyBeauty This works fine for me

from llama_index.core import Settings, VectorStoreIndex
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.readers import SimpleDirectoryReader
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

documents = SimpleDirectoryReader("docs/docs/examples/data/paul_graham").load_data()

Settings.node_parser = node_parser

sentence_index = VectorStoreIndex.from_documents(
    documents,
)

nodes = node_parser.get_nodes_from_documents(documents)

print(nodes[0].metadata.keys())

postproc = MetadataReplacementPostProcessor(target_metadata_key="window")

sentence_window_engine = sentence_index.as_query_engine(
    similarity_top_k=2, node_postprocessors=[postproc],
)

question = "What happened to the author at Viaweb?"
response = sentence_window_engine.query(question)
print(f"response: {response}")
print(f"len: {len(response.source_nodes)}")
print("response.source_nodes[0].node.metadata",response.source_nodes[0].node.metadata)

window = response.source_nodes[0].node.metadata["window"]
sentence = response.source_nodes[0].node.metadata["original_text"]

print("Source metadata keys", str(response.source_nodes[0].node.metadata.keys()))
print("------------------")
print(f"Window: {window}")
print("------------------")
print(f"Original Sentence: {sentence}")
JasonFlyBeauty commented 2 months ago

@JasonFlyBeauty This works fine for me

from llama_index.core import Settings, VectorStoreIndex
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.readers import SimpleDirectoryReader
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

documents = SimpleDirectoryReader("docs/docs/examples/data/paul_graham").load_data()

Settings.node_parser = node_parser

sentence_index = VectorStoreIndex.from_documents(
    documents,
)

nodes = node_parser.get_nodes_from_documents(documents)

print(nodes[0].metadata.keys())

postproc = MetadataReplacementPostProcessor(target_metadata_key="window")

sentence_window_engine = sentence_index.as_query_engine(
    similarity_top_k=2, node_postprocessors=[postproc],
)

question = "What happened to the author at Viaweb?"
response = sentence_window_engine.query(question)
print(f"response: {response}")
print(f"len: {len(response.source_nodes)}")
print("response.source_nodes[0].node.metadata",response.source_nodes[0].node.metadata)

window = response.source_nodes[0].node.metadata["window"]
sentence = response.source_nodes[0].node.metadata["original_text"]

print("Source metadata keys", str(response.source_nodes[0].node.metadata.keys()))
print("------------------")
print(f"Window: {window}")
print("------------------")
print(f"Original Sentence: {sentence}")

Thank you very much for the help you provided, I double checked my code and found that it was a dependency package conflict that was the cause. Thanks again for your friendly help!