kyma-project / kyma-companion

A tool that brings AI to Kyma
Apache License 2.0
3 stars 12 forks source link

feat: added script to pull documents for RAG #241

Closed mfaizanse closed 1 month ago

mfaizanse commented 1 month ago

Description

Changes proposed in this pull request:

Related issue(s)

github-actions[bot] commented 1 month ago

Note(s) for PR Auther:

muralov commented 1 month ago

I have found the following files after the docs are fetched, which can be mostly excluded:

api-gateway/docs/release-notes it has mostly release notes.

api-gateway/docs/contributor/adr/adr-template.md/docs/adr-template.md:

# <!--- Provide title -->

## Status
<!--- Specify the current state of the ADR, such as whether it is proposed, accepted, rejected, deprecated, superseded, etc. -->

## Context
<!--- Describe the issue or problem that is motivating this decision or change. -->

## Decision
<!--- Explain the proposed change or action and the reason behind it. -->

## Consequences
<!--- Discuss the impact of this change, including what becomes easier or more complicated as a result. -->

Lots of README.md had almost little meaningful contents. For example: docker-registry/docs/user/resources/README.md:

# Resources

In this section, you can find the custom resources (CR) used in the Docker Registry module.

Maybe a follow-up PR to clean-up these files?

mfaizanse commented 1 month ago

I have found the following files after the docs are fetched, which can be mostly excluded: api-gateway/docs/release-notes it has mostly release notes. api-gateway/docs/contributor/adr/adr-template.md_garbage/docsadr-template.md:

# <!--- Provide title -->

## Status
<!--- Specify the current state of the ADR, such as whether it is proposed, accepted, rejected, deprecated, superseded, etc. -->

## Context
<!--- Describe the issue or problem that is motivating this decision or change. -->

## Decision
<!--- Explain the proposed change or action and the reason behind it. -->

## Consequences
<!--- Discuss the impact of this change, including what becomes easier or more complicated as a result. -->

Lots of README.md had almost little meaningful contents. For example: docker-registry/docs/user/resources/README.md_garbage:

# Resources

In this section, you can find the custom resources (CR) used in the Docker Registry module.

Maybe a follow-up PR to clean-up these files?

I have excluded some files. But for automation of this, we should have some check before we create embeddings for a chunk.