langflow-ai / langflow

Langflow is a low-code app builder for RAG and multi-agent AI applications. It’s Python-based and agnostic to any model, API, or database.
http://www.langflow.org
MIT License
34.54k stars 4.16k forks source link

File Loader and Document loader not working #3959

Closed niko-logothetis closed 1 month ago

niko-logothetis commented 1 month ago

Bug Description

Hi there,

recently my file loader doesn't load any file any more, the document loader either. No files are loaded, the component output is empty as well, there is no error. In the open-dialog window I choose a file (tried with PDF, TXT, CSV) but the file isn't loaded.

Reproduction

There is no code sample to add.

Expected behavior

Files of different type should be loaded.

Who can help?

No response

Operating System

Windows 11

Langflow Version

1.0.8

Python Version

3.12

Screenshot

No response

Flow File

No response

dosubot[bot] commented 1 month ago

Hey @niko-logothetis! 👋 I'm here to help you with any bugs or questions you have. Let's work together to solve this issue! Dosu

The issue with the File Loader and Document Loader components in Langflow 1.0.8 on Windows 11 with Python 3.12 not loading any files, including PDF, TXT, and CSV, without showing any errors, is likely due to the build method not being called correctly or the dynamic import of the loader module failing silently.

Here is the relevant code snippet from FileLoaderComponent:

def build(self, file_path: str, loader: str) -> Document:
    file_type = file_path.split(".")[-1]

    # Mapeie o nome do loader selecionado para suas informações
    selected_loader_info = None
    for loader_info in LOADERS_INFO:
        if loader_info["name"] == loader:
            selected_loader_info = loader_info
            break

    if selected_loader_info is None and loader != "Automatic":
        raise ValueError(f"Loader {loader} not found in the loader info list")

    if loader == "Automatic":
        # Determine o loader automaticamente com base na extensão do arquivo
        default_loader_info = None
        for info in LOADERS_INFO:
            if "defaultFor" in info and file_type in info["defaultFor"]:
                default_loader_info = info
                break

        if default_loader_info is None:
            raise ValueError(f"No default loader found for file type: {file_type}")

        selected_loader_info = default_loader_info

    if isinstance(selected_loader_info, dict):
        loader_import: str = selected_loader_info["import"]
    else:
        raise ValueError(f"Loader info for {loader} is not a dict\nLoader info:\n{selected_loader_info}")

    module_name, class_name = loader_import.rsplit(".", 1)

    try:
        # Importe o loader dinamicamente
        loader_module = __import__(module_name, fromlist=[class_name])
        loader_instance = getattr(loader_module, class_name)
    except ImportError as e:
        raise ValueError(f"Loader {loader} could not be imported\nLoader info:\n{selected_loader_info}") from e

    result = loader_instance(file_path=file_path)
    return result.load()

To troubleshoot this issue, you should:

  1. Ensure that the build method is being called with the correct parameters.
  2. Check if the dynamic import of the loader module is failing silently. You can add additional logging to capture any import errors.
  3. Verify that the LOADERS_INFO contains the correct information for the loaders and that the file types are correctly mapped.

If the dynamic import is failing, you might not see any errors if the exception handling is not logging the error properly. Adding logging or print statements can help identify where the failure occurs.

Additionally, ensure that the FileComponent and ChatInput components are correctly configured. The FileComponent should have the correct file types specified, and the ChatInput should be properly set up to handle file inputs. Here is an example configuration for the FileComponent:

class FileComponent(Component):
    display_name = "File"
    description = "A generic file loader."
    icon = "file-text"

    inputs = [
        FileInput(
            name="path",
            display_name="Path",
            file_types=TEXT_FILE_TYPES,
            info=f"Supported file types: {', '.join(TEXT_FILE_TYPES)}",
        ),
        BoolInput(
            name="silent_errors",
            display_name="Silent Errors",
            advanced=True,
            info="If true, errors will not raise an exception.",
        ),
    ]

    outputs = [
        Output(display_name="Data", name="data", method="load_file"),
    ]

    def load_file(self) -> Data:
        if not self.path:
            raise ValueError("Please, upload a file to use this component.")
        resolved_path = self.resolve_path(self.path)
        silent_errors = self.silent_errors

        extension = Path(resolved_path).suffix[1:].lower()

        if extension == "doc":
            raise ValueError("doc files are not supported. Please save as .docx")
        if extension not in TEXT_FILE_TYPES:
            raise ValueError(f"Unsupported file type: {extension}")

        data = parse_text_file_to_data(resolved_path, silent_errors)
        self.status = data if data else "No data"
        return data or Data()

Ensure that the TEXT_FILE_TYPES and other relevant configurations are correctly set up to support the file types you are trying to load. Additionally, verify that the ChatInput component is correctly configured to handle file inputs, as shown in the provided JSON configuration:

{
  "data": {
    "edges": [
      {
        "className": "",
        "data": {
          "sourceHandle": {
            "dataType": "ChatInput",
            "id": "ChatInput-Emi4q",
            "name": "message",
            "output_types": [
              "Message"
            ]
          },
          "targetHandle": {
            "fieldName": "Question",
            "id": "Prompt-n8yRL",
            "inputTypes": [
              "Message",
              "Text"
            ],
            "type": "str"
          }
        },
        "id": "reactflow__edge-ChatInput-Emi4q{œdataTypeœ:œChatInputœ,œidœ:œChatInput-Emi4qœ,œnameœ:œmessageœ,œoutput_typesœ:[œMessageœ]}-Prompt-n8yRL{œfieldNameœ:œQuestionœ,œidœ:œPrompt-n8yRLœ,œinputTypesœ:[œMessageœ,œTextœ],œtypeœ:œstrœ}",
        "source": "ChatInput-Emi4q",
        "sourceHandle": "{œdataTypeœ: œChatInputœ, œidœ: œChatInput-Emi4qœ, œnameœ: œmessageœ, œoutput_typesœ: [œMessageœ]}",
        "target": "Prompt-n8yRL",
        "targetHandle": "{œfieldNameœ: œQuestionœ, œidœ: œPrompt-n8yRLœ, œinputTypesœ: [œMessageœ, œTextœ], œtypeœ: œstrœ}"
      },
      {
        "className": "",
        "data": {
          "sourceHandle": {
            "dataType": "Prompt",
            "id": "Prompt-n8yRL",
            "name": "prompt",
            "output_types": [
              "Message"
            ]
          },
          "targetHandle": {
            "fieldName": "input_value",
            "id": "OpenAIModel-1hwZ2",
            "inputTypes": [
              "Message"
            ],
            "type": "str"
          }
        },
        "id": "reactflow__edge-Prompt-n8yRL{œdataTypeœ:œPromptœ,œidœ:œPrompt-n8yRLœ,œnameœ:œpromptœ,œoutput_typesœ:[œMessageœ]}-OpenAIModel-1hwZ2{œfieldNameœ:œinput_valueœ,œidœ:œOpenAIModel-1hwZ2œ,œinputTypesœ:[œMessageœ],œtypeœ:œstrœ}",
        "source": "Prompt-n8yRL",
        "sourceHandle": "{œdataTypeœ: œPromptœ, œidœ: œPrompt-n8yRLœ, œnameœ: œpromptœ, œoutput_typesœ: [œMessageœ]}",
        "target": "OpenAIModel-1hwZ2",
        "targetHandle": "{œfieldNameœ: œinput_valueœ, œidœ: œOpenAIModel-1hwZ2œ, œinputTypesœ: [œMessageœ], œtypeœ: œstrœ}"
      },
      {
        "className": "",
        "data": {
          "sourceHandle": {
            "dataType": "OpenAIModel",
            "id": "OpenAIModel-1hwZ2",
            "name": "text_output",
            "output_types": [
              "Message"
            ]
          },
          "targetHandle": {
            "fieldName": "input_value",
            "id": "ChatOutput-sD0lp",
            "inputTypes": [
              "Message"
            ],
            "type": "str"
          }
        },
        "id": "reactflow__edge-OpenAIModel-1hwZ2{œdataTypeœ:œOpenAIModelœ,œidœ:œOpenAIModel-1hwZ2œ,œnameœ:œtext_outputœ,œoutput_typesœ:[œMessageœ]}-ChatOutput-sD0lp{œfieldNameœ:œinput_valueœ,œidœ:œChatOutput-sD0lpœ,œinputTypesœ:[œMessageœ],œtypeœ:œstrœ}",
        "source": "OpenAIModel-1hwZ2",
        "sourceHandle": "{œdataTypeœ: œOpenAIModelœ, œidœ: œOpenAIModel-1hwZ2œ, œnameœ: œtext_outputœ, œoutput_typesœ: [œMessageœ]}",
        "target": "ChatOutput-sD0lp",
        "targetHandle": "{œfieldNameœ: œinput_valueœ, œidœ: œChatOutput-sD0lpœ, œinputTypesœ: [œMessageœ], œtypeœ: œstrœ}"
      },
      {
        "className": "",
        "data": {
          "sourceHandle": {
            "dataType": "ParseData",
            "id": "ParseData-qYLes",
            "name": "text",
            "output_types": [
              "Message"
            ]
          },
          "targetHandle": {
            "fieldName": "Document",
            "id": "Prompt-n8yRL",
            "inputTypes": [
              "Message",
              "Text"
            ],
            "type": "str"
          }
        },
        "id": "reactflow__edge-ParseData-qYLes{œdataTypeœ:œParseDataœ,œidœ:œParseData-qYLesœ,œnameœ:œtextœ,œoutput_typesœ:[œMessageœ]}-Prompt-n8yRL{œfieldNameœ:œDocumentœ,œidœ:œPrompt-n8yRLœ,œinputTypesœ:[œMessageœ,œTextœ],œtypeœ:œstrœ}",
        "source": "ParseData-qYLes",
        "sourceHandle": "{œdataTypeœ: œParseDataœ, œidœ: œParseData-qYLesœ, œnameœ: œtextœ, œoutput_typesœ: [œMessageœ]}",
        "target": "Prompt-n8yRL",
        "targetHandle": "{œfieldNameœ: œDocumentœ, œidœ: œPrompt-n8yRLœ, œinputTypesœ: [œMessageœ, œTextœ], œtypeœ: œstrœ}"
      }
    ]
  }
}

Ensure that the files input in the ChatInput component is correctly configured to handle the file types you are trying to load [1][[2]](https://github.com/langflow-ai/langflow/blob/dev/src/backend/base/langflow/initial_setup/starter_projects/Document QA.json)[[3]](https://github.com/langflow-ai/langflow/blob/main/src/backend/base/langflow/initial_setup/starter_projects/Document QA.json).

To continue talking to Dosu, mention @dosu.

niko-logothetis commented 1 month ago

I just found out that it seems to be an issue with the browser I am using. The loader works in Firefox, in Google Chrome it doesn't. Maybe I have too many tabs open.

carlosrcoelho commented 1 month ago

@niko-logothetis Hey, are you still facing this issue on Google Chrome? Is any error showing up for you? Can you send a screenshot?

If not, please let us know if this issue can be closed.

niko-logothetis commented 1 month ago

@niko-logothetis Hey, are you still facing this issue on Google Chrome? Is any error showing up for you? Can you send a screenshot?

If not, please let us know if this issue can be closed.

Hi Carlos, unfortunatelly the error is still there, I try to upload a file with the file-node, dialogue opens, I choose the file but it is not uploaded. I tried it with my local install and also on DataStax, same issue. I started Chrome in Debug mode and got the following error:

[20128:59576:1001/130411.226:ERROR:registration_request.cc(291)] Registration response error message: DEPRECATED_ENDPOINT [20128:60624:1001/130416.998:ERROR:device_event_log_impl.cc(201)] [13:04:16.998] USB: usb_service_win.cc:105 SetupDiGetDeviceProperty({{A45C254E-DF1C-4EFD-8020-67D146A850E0}, 6}) failed: Element nicht gefunden. (0x490) [20128:60624:1001/130517.153:ERROR:device_event_log_impl.cc(201)] [13:05:17.148] USB: usb_service_win.cc:105 SetupDiGetDeviceProperty({{A45C254E-DF1C-4EFD-8020-67D146A850E0}, 6}) failed: Element nicht gefunden. (0x490) [20128:59576:1001/130557.624:ERROR:registration_request.cc(291)] Registration response error message: DEPRECATED_ENDPOINT [20128:60624:1001/130717.534:ERROR:device_event_log_impl.cc(201)] [13:07:17.534] USB: usb_service_win.cc:105 SetupDiGetDeviceProperty({{A45C254E-DF1C-4EFD-8020-67D146A850E0}, 6}) failed: Element nicht gefunden. (0x490) [20128:60624:1001/130817.681:ERROR:device_event_log_impl.cc(201)] [13:08:17.681] USB: usb_service_win.cc:105 SetupDiGetDeviceProperty({{A45C254E-DF1C-4EFD-8020-67D146A850E0}, 6}) failed: Element nicht gefunden. (0x490)

Any ideas?

Just to mention again: If I use Firefox, I can upload a file. Back in Chrome then it also appears. Chrome Version: Version 129.0.6668.71 (Official Build) (64-bit) Langflow Version: 1.0.18

carlosrcoelho commented 1 month ago

@niko-logothetis

Could you share your flow?

carlosrcoelho commented 1 month ago

Thank you for your contribution! This issue will be closed. If you have any questions or encounter another problem, please open a new issue and we will be ready to help you.