Closed takan1 closed 2 weeks ago
@takan1 Right now you can use it in this way:
result = crawler.run( r"https://www.nbcnews.com/business", word_count_threshold=0)
from crawl4ai.extraction_strategy import LLMExtractionStrategy
llm_extraction_strategy = LLMExtractionStrategy(
provider= "openai/gpt-4o-mini", api_token = os.getenv('OPENAI_API_KEY'),
instruction="""Extract headers fromt his markdown content"""
)
extraction_result = llm_extraction_strategy.run("", [result.markdown])
print(extraction_result)
However it seems to me a good option to add to the library.
Is it possible to extract with LLMExtractionStrategy from markdown or cleaned_html (Not from html)?