Closed sivakumar05 closed 3 years ago
Hi @sivakumar05 Appreciate your interest in the library. Could you please attach the PDF as well and the code you used to extract the text from it?
Syntax: import pdfplumber filename='Vishwa_Srivastava_CV_Sep15.pdf' with pdfplumber.open(filename) as pdf: first_page = pdf.pages[0] text = first_page.extract_text().split('\n')
text=text.lower()
Thanks for sharing the PDF @sivakumar05. The .extract_text(...)
methods takes in 2 optional arguments x_tolerance
and y_tolerance
.
x_tolerance
- Adds a space where the difference between the x1
of one character and the x0
of the next is greater than x_tolerance
. Defaults to 3.y_tolerance
- Adds a newline character where the difference between the doctop
of one character and the doctop
of the next is greater than y_tolerance
. Defaults to 3.In your case, you can use a smaller value than 3 like 1 for x_tolerance
. With page.extract_text(x_tolerance=1)
, the output becomes
VISHWA SRIVASTAVA
Entrepreneur | Ex-Management Consultant
vishwa.srivastava25@gmail.com +91-9560677151 Bangalore, India
[ linkedin.com/in/vishwa-srivastava (cid:211) ‰
fl
PROFESSIONAL EXPERIENCE EXPOSURE & SKILLSETS
Co-Founder & CEO Capital Markets Wealth Management
Pvot.in |Stock Market Advisors Marketplace E-commerce Retail Industrial Goods
2018 – 2020 Bengaluru, India Metals Railways Oil & Gas
(cid:17) ‰
Built fully bootstrapped business from scratch, defined revenue model, Go-To-
•
Market Strategy for the product, drove customer and partner acquisition
GTM Strategy Fund Raising
Secured an investment term sheet at USD 1.06 Mn Pre-money valuation
•
On-boarded 50+ Experts and partnered with leading brokerages and P2P Product Management Program Mgmt.
•
lending companies on a revenue sharing agreement
Mix Panel Wireframing
Achieved >60% DAU among experts by building high engagement features
•
Marketing Strategy Market Assessment
Built & managed a 11 member team on tech, marketing, UX and content
•
Led daily scrum meetings with developers to progress on product roadmap User Research ASO
•
EDUCATION
Management Consultant
Accenture
MBA - Strategy & Operations
2019 – 2019 Bengaluru, India
Indian School of Business
(cid:17)EBITDA Improvement using Advanc‰ed Analytics | Metals
Identified & sized opportunities to apply machine learning models to improve 2016 – 2017
•
throughput & reduce cost for one of India’s largest steel manufacturer (cid:17)B.E. in Chemical Engineering
Deployed analytical models addressing opportunities worth USD 23 Mn
• MS University of Baroda
across Iron making value chain
2005 – 2009
Senior Consultant (cid:17)ACHIEVEMENTS
KPMG
2017 – 2019 Mumbai, India KPMG Kudos Award
(cid:17)Route to Market Strategy Transform‰ation | Retail & Industrial Goods Going extra mile to achieve desired re-
3
Designed new organizational structure to align with GTM strategies sults and building strong client relation-
•
ships (2018)
Set up marketing vertical and created Pan-India ATL/BTL activation plan
•
Conceptualized new product promotion schemes, conducted portfolio ratio- KPMG Super Team Award
•
nalization, ideated partner loyalty program and conducted vendor tie-ups Outstanding client service and excep-
(cid:143)
Designed remuneration – commission and incentives for distributors and tional team work (2017)
•
Sales Force, basis ROI & competitive benchmarking, helping win market 3 Commendation by Iraqi minister of
pct share across retail segments in Western & Southern zones
Natural Resources
Growth Strategy & Market Assessment | Metals & Railways
Ensuring zero downtime & ontime de-
Downstream opportunity identification & sizing for an Indian MNC livery during volatile conditions(2014)
•
Prepared investment proposal for shortlisted high growth and high EBITDA
• Honeywell Bravo Award
downstream value added sectors
Delivering outstanding customer ser-
Market Entry & Location Assessment | Petrochemicals 3
vice in Taiwan (2012)
Strategy for Indian entry via greenfield expansion for a South Korean client
• LANGUAGES
Process Consultant
Honeywell English
2009 - 2016 USA, India, EMEA, LatAm, SEA Hindi ○ ○ ○ ○ ○
(cid:17)20+ Operations improvement engag‰ements with global O&G Majors Spanish ○ ○ ○ ○ ○
○ ○ ○ ○ ○
Volunteered to lead 1st project for refinery capacity debottlenecking in Iraq
•
Regularly managed teams of 30-40 contract workers & engineers during op-
•
erationalization phase of engagements
Led intra-SBU team to commercialize Honeywell’s IoT based suite, optimizing
•
Upstream & Downstream operations
Thanks for sharing the PDF @sivakumar05. The
.extract_text(...)
methods takes in 2 optional argumentsx_tolerance
andy_tolerance
.* `x_tolerance` - Adds a space where the difference between the `x1` of one character and the `x0` of the next is greater than `x_tolerance`. Defaults to 3. * `y_tolerance` - Adds a newline character where the difference between the `doctop` of one character and the `doctop` of the next is greater than `y_tolerance`. Defaults to 3.
In your case, you can use a smaller value than 3 like 1 for
x_tolerance
. Withpage.extract_text(x_tolerance=1)
, the output becomesVISHWA SRIVASTAVA Entrepreneur | Ex-Management Consultant vishwa.srivastava25@gmail.com +91-9560677151 Bangalore, India [ linkedin.com/in/vishwa-srivastava (cid:211) ‰ fl PROFESSIONAL EXPERIENCE EXPOSURE & SKILLSETS Co-Founder & CEO Capital Markets Wealth Management Pvot.in |Stock Market Advisors Marketplace E-commerce Retail Industrial Goods 2018 – 2020 Bengaluru, India Metals Railways Oil & Gas (cid:17) ‰ Built fully bootstrapped business from scratch, defined revenue model, Go-To- • Market Strategy for the product, drove customer and partner acquisition GTM Strategy Fund Raising Secured an investment term sheet at USD 1.06 Mn Pre-money valuation • On-boarded 50+ Experts and partnered with leading brokerages and P2P Product Management Program Mgmt. • lending companies on a revenue sharing agreement Mix Panel Wireframing Achieved >60% DAU among experts by building high engagement features • Marketing Strategy Market Assessment Built & managed a 11 member team on tech, marketing, UX and content • Led daily scrum meetings with developers to progress on product roadmap User Research ASO • EDUCATION Management Consultant Accenture MBA - Strategy & Operations 2019 – 2019 Bengaluru, India Indian School of Business (cid:17)EBITDA Improvement using Advanc‰ed Analytics | Metals Identified & sized opportunities to apply machine learning models to improve 2016 – 2017 • throughput & reduce cost for one of India’s largest steel manufacturer (cid:17)B.E. in Chemical Engineering Deployed analytical models addressing opportunities worth USD 23 Mn • MS University of Baroda across Iron making value chain 2005 – 2009 Senior Consultant (cid:17)ACHIEVEMENTS KPMG 2017 – 2019 Mumbai, India KPMG Kudos Award (cid:17)Route to Market Strategy Transform‰ation | Retail & Industrial Goods Going extra mile to achieve desired re- 3 Designed new organizational structure to align with GTM strategies sults and building strong client relation- • ships (2018) Set up marketing vertical and created Pan-India ATL/BTL activation plan • Conceptualized new product promotion schemes, conducted portfolio ratio- KPMG Super Team Award • nalization, ideated partner loyalty program and conducted vendor tie-ups Outstanding client service and excep- (cid:143) Designed remuneration – commission and incentives for distributors and tional team work (2017) • Sales Force, basis ROI & competitive benchmarking, helping win market 3 Commendation by Iraqi minister of pct share across retail segments in Western & Southern zones Natural Resources Growth Strategy & Market Assessment | Metals & Railways Ensuring zero downtime & ontime de- Downstream opportunity identification & sizing for an Indian MNC livery during volatile conditions(2014) • Prepared investment proposal for shortlisted high growth and high EBITDA • Honeywell Bravo Award downstream value added sectors Delivering outstanding customer ser- Market Entry & Location Assessment | Petrochemicals 3 vice in Taiwan (2012) Strategy for Indian entry via greenfield expansion for a South Korean client • LANGUAGES Process Consultant Honeywell English 2009 - 2016 USA, India, EMEA, LatAm, SEA Hindi ○ ○ ○ ○ ○ (cid:17)20+ Operations improvement engag‰ements with global O&G Majors Spanish ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ Volunteered to lead 1st project for refinery capacity debottlenecking in Iraq • Regularly managed teams of 30-40 contract workers & engineers during op- • erationalization phase of engagements Led intra-SBU team to commercialize Honeywell’s IoT based suite, optimizing • Upstream & Downstream operations
Hi, How can I seperate text in the same line for example in the above case, PROFESSIONAL EXPERIENCE EXPOSURE & SKILLSETS are printed in the same line even though they're in the different headers of the given pdf So how can i get them to look like this(PROFESSIONAL EXPERIENCE \n EXPOSURE & SKILLSETS)? Thanks
@charan7799 You can use y_tolerance
to add new lines
Hi Jsvine\Others,
I'm using 'pdfplumber' library related functions to extract text data from pdf files. Except for one file, from remaining files, I could extract data correctly. Please find below for details.
Issue: In the extracted text I don't see space between words but space between words is present in input file.
Syntax used to extract text :
import pdfplumber filename='Vishwa_Srivastava_CV_Sep15.pdf' with pdfplumber.open(filename) as pdf: first_page = pdf.pages[0]
text = first_page.extract_text().split('\n') text=text.lower()
Output:
'vishwa srivastava\nentrepreneur|ex-managementconsultant\nvishwa.srivastava25@gmail.com +91-9560677151 bangalore,india\nǜ linkedin.com/in/vishwa-srivastava (cid:211) ‰\nfl\nprofessional experience exposure & skillsets\nco-founder&ceo capitalmarkets wealthmanagement\npvot.in|stockmarketadvisorsmarketplace e-commerce retail industrialgoods\n2018–2020 bengaluru,india metals railways oil&gas\n(cid:17) ‰\nbuiltfullybootstrappedbusinessfromscratch,definedrevenuemodel,go-to-\n•\nmarketstrategyfortheproduct,drovecustomerandpartneracquisition\ngtmstrategy fundraising\nsecuredaninvestmenttermsheetatusd1.06mnpre-moneyvaluation\n•\non-boarded50+expertsandpartneredwithleadingbrokeragesandp2p productmanagement programmgmt.\n•\nlendingcompaniesonarevenuesharingagreement\nmixpanel wireframing\n
Please suggest me required corrections for my syntax to read the text with space between words.
Let me know for any additional details.
Thanks & Regards, Siva