I am attempting to extract out the "products affected" bullet points and the sections and their text such as "Indications -> All FDA-approved Indications."
I'm using horizontal_strategy: text, vertical_strategy: text, min_words_vertical: 2, and keep_blank_chars: 2.
However, the result of this is something like
AAT DEFICIENCY
Products Affected: •Prolastin-c
•Aralast Np INJ 1000MG, 500MG,•Zemaira
800MG
•Glassia
PA Criteria: Criteria Details
Indications: All FDA-approved Indications.
Off-Label Uses: N/A
Exclusion: N/A
Criteria
Required: DOC OF HIGH-RISK PHENOTYPE (E.G. PIZZ,PIZ(NULL),
Medical: PI(NULL)(NULL), PLASMA AAT LEVEL BELOW 11 MICROMOL/L
Information: (CORRESPONDING TO 80M EQUAL TO 35% AND LESS TO COMPLY WITH PROTOCG/D THA OLL) FEV1 GREATER THAN OR N 80% OF PREDICTED ABILITY FOR ADMINISTRATION
It looks like PDFPlumber is completely disregarding both the structure in the "Products Affected" section, and the lines in the PA Criteria/Criteria Details section, and printing one line at a time.
Hi, I'm running into an issue with PDFplumber.
I am attempting to parse this document: https://healthalliance.org/Cms/Media?uri=https%3A%2F%2Fhealthalliance.org%2Fmedia%2Fresources%2Fmed-preauth-drugs.pdf
I am attempting to extract out the "products affected" bullet points and the sections and their text such as "Indications -> All FDA-approved Indications."
I'm using horizontal_strategy: text, vertical_strategy: text, min_words_vertical: 2, and keep_blank_chars: 2.
However, the result of this is something like
It looks like PDFPlumber is completely disregarding both the structure in the "Products Affected" section, and the lines in the PA Criteria/Criteria Details section, and printing one line at a time.
Is there something I'm doing wrong?
Thanks