Closed AurelienThiery closed 1 year ago
I can help with this. I can't really tell that much though from what I see here. I need more info on the dataframe and the sas dataset. What's to be seen in the CMD window seems to show df2sd() ran, and it seems to have terminated the SAS session without issue. But, I can't really tell without more info. The df has some index on it that won't be - can't be - included in the dataset, so depending upon what that is, it could account for a discrepancy; but I don't see anything about either the df or dataset to guess. Can you run the following and provide me the output (at the end of your current program after running df2sd)? You may need to wrap some of this w/ print() if you're running it in batch, but not if you run it interactively.
# change the line you have for df2sd to assign it to sd so we can reference the data set next:
# [...] all your program here but change the last line like this:
sd = sas.df2sd(df, 'sinistres_auth', 'datamart')
# after that run this
print(sas.lastlog())
df.dtypes
sas.df_char_lengths(df)
sd = sas.sasdata('sinistres_auth', 'datamart')
sd.columnInfo()
df.shape
sd.obs()
sd.contents(results='text')
With this I can at least see what the df is like and what the data set has in it.
Thanks, Tom
Hi Tom, The problem is the execution is blocked here (the next instruction are not executed) :
@.***
I can send you the pickle file if you want make test. The file is very large (> 4GB), i can send you by Crédit Agricole ftp website (named Artefis). If you are ok can you send me your email please ?
Best regards, Aurélien
De : Tom Weber @.> Envoyé : mardi 11 octobre 2022 18:18 À : sassoftware/saspy @.> Cc : THIERY Aurelien @.>; Author @.> Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)
I can help with this. I can't really tell that much though from what I see here. I need more info on the dataframe and the sas dataset. What's to be seen in the CMD window seems to show df2sd() ran, and it seems to have terminated the SAS session without issue. But, I can't really tell without more info. The df has some index on it that won't be - can't be - included in the dataset, so depending upon what that is, it could account for a discrepancy; but I don't see anything about either the df or dataset to guess. Can you run the following and provide me the output (at the end of your current program after running df2sd)? You may need to wrap some of this w/ print() if you're running it in batch, but not if you run it interactively.
sd = sas.df2sd(df, 'sinistres_auth', 'datamart')
print(sas.lastlog())
df.dtypes
sas.df_char_lengths(df)
sd = sas.sasdata('sinistres_auth', 'datamart')
sd.columnInfo()
df.shape
sd.obs()
sd.contents(results='text')
With this I can at least see what the df is like and what the data set has in it.
Thanks, Tom
— Reply to this email directly, view it on GitHubhttps://github.com/sassoftware/saspy/issues/487#issuecomment-1274951211, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD5A4KDQAAGUBPUV3WFDMZDWCWHKFANCNFSM6AAAAAARCNQKGQ. You are receiving this because you authored the thread.Message ID: @.**@.>>
Sure, I can try that out. My email is on my account here, but here it is too: tom.weber@sas.com.
Ok perfect!I send you the file tomorrow.You are working in Cary?AurélienLe 12 oct. 2022 à 17:27, Tom Weber @.> a écrit : Sure, I can try that out. My email is on my account here, but here it is too: @.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
Yes, I'm in Cary. I'm not sure where you said it stopped. If it's on the df2sd, something you can do to just get the info is to change it to just use the first n rows; df[:1000] So if you just play with that, it would need to be for these 2 methods:
sd = sas.df2sd(df[:1000], 'sinistres_auth', 'datamart') sas.df_char_lengths(df[:1000])
The other references using 'df' should be ok.
Also, just playing with this a little yesterday, I did df = sas.sd2df('cars','sashelp') followed by df = df.append(df) till I had 7 million rows or so, and then did sd = sas.df2sd(df) and I never saw the memory go above maybe 70k; I looked at taskmanager, like you showed as well as using process explorer; they both showed equivalent cpu and memory, and I/O for the process. So, I'm not sure where your 30G of memory could be coming from either. That doesn't make any sense to me yet.
But, we'll get to the bottom of it.
One question is about the drives you're using. SAS is on C:, while WORK is on W: and the libref points to H:. Are these actual drives (or partitions) on your actual PC? Or are they remote mounted volumes? Just trying to come up with some guesses as to what you're seeing w/ that memory consumption. Not that I have a correlation with drives; it's just something that jumped out at me looking at the output.
Thanks, Tom
Ok i will send you the file tonight (for me).I answer your questions tomorrow.AurélienLe 12 oct. 2022 à 17:50, Tom Weber @.***> a écrit : Yes, I'm in Cary. I'm not sure where you said it stopped. If it's on the df2sd, something you can do to just get the info is to change it to just use the first n rows; df[:1000] So if you just play with that, it would need to be for these 2 methods: sd = sas.df2sd(df[:1000], 'sinistres_auth', 'datamart') sas.df_char_lengths(df[:1000]) The other references using 'df' should be ok. Also, just playing with this a little yesterday, I did df = sas.sd2df('cars','sashelp') followed by df = df.append(df) till I had 7 million rows or so, and then did sd = sas.df2sd(df) and I never saw the memory go above maybe 70k; I looked at taskmanager, like you showed as well as using process explorer; they both showed equivalent cpu and memory, and I/O for the process. So, I'm not sure where your 30G of memory could be coming from either. That doesn't make any sense to me yet. But, we'll get to the bottom of it. One question is about the drives you're using. SAS is on C:, while WORK is on W: and the libref points to H:. Are these actual drives (or partitions) on your actual PC? Or are they remote mounted volumes? Just trying to come up with some guesses as to what you're seeing w/ that memory consumption. Not that I have a correlation with drives; it's just something that jumped out at me looking at the output. Thanks, Tom
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
Tom,
For the tests I’m on my windows PC and processing (python script) connects to the SAS server (all PCs and servers are in the same domain without firewall). But the target is to execute processing on a linux server with docker and we have the same result.
The SAS Server is on Windows 2019 (4 CPU and 64GB RAM in development environment, 16 CPU and 128 GB RAM in production) and SAS version is 9.4 M7. For the tests i changed the sasv9.cfg to increase RAM limit to 32GB.
On the SAS server all the disks are remote mounted volumes (netapp full flash)
To finish i specify we have no problem with small dataframe.
I send you the pickle file with our internal solution : Artefis.
Aurélien
PS : sorry for my english L
De : Tom Weber @.> Envoyé : mercredi 12 octobre 2022 17:50 À : sassoftware/saspy @.> Cc : THIERY Aurelien @.>; Author @.> Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)
Yes, I'm in Cary. I'm not sure where you said it stopped. If it's on the df2sd, something you can do to just get the info is to change it to just use the first n rows; df[:1000] So if you just play with that, it would need to be for these 2 methods:
sd = sas.df2sd(df[:1000], 'sinistres_auth', 'datamart') sas.df_char_lengths(df[:1000])
The other references using 'df' should be ok.
Also, just playing with this a little yesterday, I did df = sas.sd2df('cars','sashelp') followed by df = df.append(df) till I had 7 million rows or so, and then did sd = sas.df2sd(df) and I never saw the memory go above maybe 70k; I looked at taskmanager, like you showed as well as using process explorer; they both showed equivalent cpu and memory, and I/O for the process. So, I'm not sure where your 30G of memory could be coming from either. That doesn't make any sense to me yet.
But, we'll get to the bottom of it.
One question is about the drives you're using. SAS is on C:, while WORK is on W: and the libref points to H:. Are these actual drives (or partitions) on your actual PC? Or are they remote mounted volumes? Just trying to come up with some guesses as to what you're seeing w/ that memory consumption. Not that I have a correlation with drives; it's just something that jumped out at me looking at the output.
Thanks, Tom
— Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1276395310 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KDW4LSP23FSLRYRJCDWC3MZBANCNFSM6AAAAAARCNQKGQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AD5A4KB4HZ2A5RO66WIKHXLWC3MZBA5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMCQ7S4.gif Message ID: @. @.> >
Ok, I downloaded the file and extracted it. Python on my pc doesn't support the format. I have python3.6 and the file is a newer format than I can read with the pickle support it has. And I haven' been able to upgrade anything in that deployment to get the needed support. On my linux development environment that has a newer Python, I can't load the entire file as it runs out of memory. I think trying to run the code w/ just df[:1000] will get me the info I'm looking for. Maybe write out a pickle file w/ that amount of data and upload it, then I can load that in my dev env and be able to do some more diagnosis. Tom
I have python 3.10.7 on my PC with 16GB RAM. I can load the pickle file in dataframe when i close all applications.Le 12 oct. 2022 à 21:43, Tom Weber @.***> a écrit : Ok, I downloaded the file and extracted it. Python on my pc doesn't support the format. I have python3.6 and the file is a newer format than I can read with the pickle support it has. And I haven' been able to upgrade anything in that deployment to get the needed support. On my linux development environment that has a newer Python, I can't load the entire file as it runs out of memory. I think trying to run the code w/ just df[:1000] will get me the info I'm looking for. Maybe write out a pickle file w/ that amount of data and upload it, then I can load that in my dev env and be able to do some more diagnosis. Tom
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
Hi Tom,
I executed this script (no problem), the file log is in this mail.
Best regards,
Aurélien
De : Tom Weber @.> Envoyé : mercredi 12 octobre 2022 21:43 À : sassoftware/saspy @.> Cc : THIERY Aurelien @.>; Author @.> Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)
Ok, I downloaded the file and extracted it. Python on my pc doesn't support the format. I have python3.6 and the file is a newer format than I can read with the pickle support it has. And I haven' been able to upgrade anything in that deployment to get the needed support. On my linux development environment that has a newer Python, I can't load the entire file as it runs out of memory. I think trying to run the code w/ just df[:1000] will get me the info I'm looking for. Maybe write out a pickle file w/ that amount of data and upload it, then I can load that in my dev env and be able to do some more diagnosis. Tom
— Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1276654410 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KGDDUL56I3TVTLW6U3WC4ID5ANCNFSM6AAAAAARCNQKGQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AD5A4KDXBV2IVDKPTWCNDGTWC4ID5A5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMDAZUU.gif Message ID: @. @.> >
With the log ;-)
De : THIERY Aurelien Envoyé : jeudi 13 octobre 2022 10:10 À : 'sassoftware/saspy' @.>; sassoftware/saspy @.> Cc : Author @.***> Objet : RE: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)
Hi Tom,
I executed this script (no problem), the file log is in this mail.
Best regards,
Aurélien
De : Tom Weber @. @.> > Envoyé : mercredi 12 octobre 2022 21:43 À : sassoftware/saspy @. @.> > Cc : THIERY Aurelien @. @.> >; Author @. @.> > Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)
Ok, I downloaded the file and extracted it. Python on my pc doesn't support the format. I have python3.6 and the file is a newer format than I can read with the pickle support it has. And I haven' been able to upgrade anything in that deployment to get the needed support. On my linux development environment that has a newer Python, I can't load the entire file as it runs out of memory. I think trying to run the code w/ just df[:1000] will get me the info I'm looking for. Maybe write out a pickle file w/ that amount of data and upload it, then I can load that in my dev env and be able to do some more diagnosis. Tom
— Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1276654410 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KGDDUL56I3TVTLW6U3WC4ID5ANCNFSM6AAAAAARCNQKGQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AD5A4KDXBV2IVDKPTWCNDGTWC4ID5A5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMDAZUU.gif Message ID: @. @.> >
Hey, assuming the log was an attachment. It didn't come through. Maybe a direct email to me? I don't see anything saying an attachment was stripped out, or anything like that.
From: AurelienThiery @.> Sent: Thursday, October 13, 2022 5:30 AM To: sassoftware/saspy @.> Cc: Tom Weber @.>; Comment @.> Subject: Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)
EXTERNAL With the log ;-)
De : THIERY Aurelien Envoyé : jeudi 13 octobre 2022 10:10 À : 'sassoftware/saspy' @.<mailto:@.>>; sassoftware/saspy @.<mailto:@.>> Cc : Author @.<mailto:@.>> Objet : RE: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)
Hi Tom,
I executed this script (no problem), the file log is in this mail.
Best regards,
Aurélien
De : Tom Weber @.<mailto:@.> @.<mailto:@.>> > Envoyé : mercredi 12 octobre 2022 21:43 À : sassoftware/saspy @.<mailto:@.> @.<mailto:@.>> > Cc : THIERY Aurelien @.<mailto:@.> @.<mailto:@.>> >; Author @.<mailto:@.> @.<mailto:@.>> > Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)
Ok, I downloaded the file and extracted it. Python on my pc doesn't support the format. I have python3.6 and the file is a newer format than I can read with the pickle support it has. And I haven' been able to upgrade anything in that deployment to get the needed support. On my linux development environment that has a newer Python, I can't load the entire file as it runs out of memory. I think trying to run the code w/ just df[:1000] will get me the info I'm looking for. Maybe write out a pickle file w/ that amount of data and upload it, then I can load that in my dev env and be able to do some more diagnosis. Tom
- Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1276654410 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KGDDUL56I3TVTLW6U3WC4ID5ANCNFSM6AAAAAARCNQKGQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AD5A4KDXBV2IVDKPTWCNDGTWC4ID5A5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMDAZUU.gif Message ID: @.<mailto:@.> @.<mailto:@.>> >
- Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsassoftware%2Fsaspy%2Fissues%2F487%23issuecomment-1277311147&data=05%7C01%7Ctom.weber%40sas.com%7C0a635ce5db604db9f84308daacfd8283%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C638012502084426200%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LvjCdLwKMg%2Fw58QHd67t36DsjRZx6fiqxkcRQFWZvHg%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEHDYZWGEEFNK2VUQFVYXB3WC7JBXANCNFSM6AAAAAARCNQKGQ&data=05%7C01%7Ctom.weber%40sas.com%7C0a635ce5db604db9f84308daacfd8283%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C638012502084582433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=eTXQGHxwihP%2BKhxAHH1vKX2cK78R5FYfUx7m5LcJgTw%3D&reserved=0. You are receiving this because you commented.Message ID: @.**@.>>
Ok i send you the log by ftp website.
De : Tom Weber @.> Envoyé : jeudi 13 octobre 2022 15:16 À : sassoftware/saspy @.> Cc : THIERY Aurelien @.>; Author @.> Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)
Hey, assuming the log was an attachment. It didn't come through. Maybe a direct email to me? I don't see anything saying an attachment was stripped out, or anything like that.
From: AurelienThiery @. <mailto:@.> > Sent: Thursday, October 13, 2022 5:30 AM To: sassoftware/saspy @. <mailto:@.> > Cc: Tom Weber @. <mailto:@.> >; Comment @. <mailto:@.> > Subject: Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)
EXTERNAL With the log ;-)
De : THIERY Aurelien Envoyé : jeudi 13 octobre 2022 10:10 À : 'sassoftware/saspy' @. <mailto:@.%3cmailto:@.> <mailto:@.>>; sassoftware/saspy @. <mailto:@.%3cmailto:@.> <mailto:@.>> Cc : Author @. <mailto:@.%3cmailto:@.> <mailto:@.>> Objet : RE: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)
Hi Tom,
I executed this script (no problem), the file log is in this mail.
Best regards,
Aurélien
De : Tom Weber @. <mailto:@.%3cmailto:@.> <mailto:@.> @. <mailto:@.%3cmailto:@.> <mailto:@.>> > Envoyé : mercredi 12 octobre 2022 21:43 À : sassoftware/saspy @. <mailto:@.%3cmailto:@.> <mailto:@.> @. <mailto:@.%3cmailto:@.> <mailto:@.>> > Cc : THIERY Aurelien @. <mailto:@.%3cmailto:@.> <mailto:@.> @. <mailto:@.%3cmailto:@.> <mailto:@.>> >; Author @. <mailto:@.%3cmailto:@.> <mailto:@.> @. <mailto:@.%3cmailto:@.> <mailto:@.>> > Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)
Ok, I downloaded the file and extracted it. Python on my pc doesn't support the format. I have python3.6 and the file is a newer format than I can read with the pickle support it has. And I haven' been able to upgrade anything in that deployment to get the needed support. On my linux development environment that has a newer Python, I can't load the entire file as it runs out of memory. I think trying to run the code w/ just df[:1000] will get me the info I'm looking for. Maybe write out a pickle file w/ that amount of data and upload it, then I can load that in my dev env and be able to do some more diagnosis. Tom
- Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1276654410 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KGDDUL56I3TVTLW6U3WC4ID5ANCNFSM6AAAAAARCNQKGQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AD5A4KDXBV2IVDKPTWCNDGTWC4ID5A5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMDAZUU.gif Message ID: @. <mailto:@.%3cmailto:@.> <mailto:@.> @. <mailto:@.%3cmailto:@.> <mailto:@.>> >
- Reply to this email directly, view it on GitHub<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsassoftware%2Fsaspy%2Fissues%2F487%23issuecomment-1277311147 https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsassoftware%2Fsaspy%2Fissues%2F487%23issuecomment-1277311147&data=05%7C01%7Ctom.weber%40sas.com%7C0a635ce5db604db9f84308daacfd8283%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C638012502084426200%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LvjCdLwKMg%2Fw58QHd67t36DsjRZx6fiqxkcRQFWZvHg%3D&reserved=0 &data=05%7C01%7Ctom.weber%40sas.com%7C0a635ce5db604db9f84308daacfd8283%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C638012502084426200%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LvjCdLwKMg%2Fw58QHd67t36DsjRZx6fiqxkcRQFWZvHg%3D&reserved=0>, or unsubscribe<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEHDYZWGEEFNK2VUQFVYXB3WC7JBXANCNFSM6AAAAAARCNQKGQ https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEHDYZWGEEFNK2VUQFVYXB3WC7JBXANCNFSM6AAAAAARCNQKGQ&data=05%7C01%7Ctom.weber%40sas.com%7C0a635ce5db604db9f84308daacfd8283%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C638012502084582433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=eTXQGHxwihP%2BKhxAHH1vKX2cK78R5FYfUx7m5LcJgTw%3D&reserved=0 &data=05%7C01%7Ctom.weber%40sas.com%7C0a635ce5db604db9f84308daacfd8283%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C638012502084582433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=eTXQGHxwihP%2BKhxAHH1vKX2cK78R5FYfUx7m5LcJgTw%3D&reserved=0>. You are receiving this because you commented.Message ID: @.**@. mailto:***@***.******@***.*** >>
— Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1277592225 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KFDQO7PW33C3YE5Z6TWDADP5ANCNFSM6AAAAAARCNQKGQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AD5A4KBVDKG22CWTPZY3LYDWDADP5A5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSME2BKC.gif Message ID: @. @.> >
cool, thanks! got it. Now let me analyze it and see what I see!
I don't see any of the output for the dataframe info. Can you just run python interactively and submit all the code, then I can actually see what code was executed and get all of the output.
>>> python
import pandas as pd
import saspy
print('Chargement du dataframe...')
df = pd.read_pickle("c:\\temp\\sinis.pkl")
print('NB lignes dataframe:', len(df.index))
print('Ouverture de la session SAS sur le serveur...')
sas = saspy.SASsession(cfgfile='C:\\Users\\aurelien.thiery\\PycharmProjects\\sascfg_personal.py')
print(sas)
print('Allocation de la bibliothèque...')
sas.saslib(libref='datamart', path='H:\\DataProj\DFICMS\\02_Datamarts\\202204')
print('Conversion du dataframe en table SAS...')
sd = sas.df2sd(df[:1000], 'sinistres_auth', 'datamart')
print(sas.lastlog())
df.shape
df.info()
sas.df_char_lengths(df[:1000])
sd.columnInfo()
df.shape
sd.obs()
sd.contents(results='text')
print(sas.saslog())
Also, I'd be curious to know the following (on the full df):
import time
start = time.time()
cl = sas.df_char_lengths(df)
print('time= ', time.time()-start)
print(cl)
Thanks! Tom
Tom,
The instructions in green :
C:\Users\aurelien.thiery\PycharmProjects\pythonProject1\venv\Scripts\python.exe "C:/Program Files/JetBrains/PyCharm Community Edition 2022.2.2/plugins/python-ce/helpers/pydev/pydevconsole.py" --mode=client --host=127.0.0.1 --port=63795
import sys; print('Python %s on %s' % (sys.version, sys.platform))
sys.path.extend(['C:\Users\aurelien.thiery\PycharmProjects\pythonProject1'])
Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.5.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 8.5.0
Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)] on win32
import pandas as pd
import saspy
print('Chargement du dataframe...')
Chargement du dataframe...
df = pd.read_pickle("c:\temp\sinis.pkl")
print('NB lignes dataframe:', len(df.index))
NB lignes dataframe: 18437883
print('Ouverture de la session SAS sur le serveur...')
Ouverture de la session SAS sur le serveur...
sas = saspy.SASsession(cfgfile='C:\Users\aurelien.thiery\PycharmProjects\sascfg_personal.py')
Using SAS Config named: winiomIWA
SAS Connection established. Subprocess id is 20904
print(sas)
Access Method = IOM
SAS Config name = winiomIWA
SAS Config file = C:\Users\aurelien.thiery\PycharmProjects\sascfg_personal.py
WORK Path = W:\WorkSAS_TD23084S00SAS017\Prc2\
SAS Version = 9.04.01M7P08052020
SASPy Version = 4.3.3
Teach me SAS = False
Batch = False
Results = Pandas
SAS Session Encoding = wlatin1
Python Encoding value = cp1252
SAS process Pid value = 23084
print('Allocation de la bibliothèque...')
Allocation de la bibliothèque...
sas.saslib(libref='datamart', path='H:\DataProj\DFICMS\02_Datamarts\202204')
5 The SAS System 16:51 Thursday, October 13, 2022
24
25 libname datamart 'H:\DataProj\DFICMS\02_Datamarts\202204' ;
NOTE: Libref DATAMART was successfully assigned as follows:
Engine: V9
Physical Name: H:\DataProj\DFICMS\02_Datamarts\202204
26
27
28
6 The SAS System 16:51 Thursday, October 13, 2022
29
print('Conversion du dataframe en table SAS...')
Conversion du dataframe en table SAS...
sd = sas.df2sd(df[:1000], 'sinistres_auth', 'datamart')
C:\Users\aurelien.thiery\PycharmProjects\pythonProject1\venv\lib\site-packages\saspy\sasioiom.py:1532: UserWarning: Note that Indexes are not transferred over as columns. Only actual columns are transferred
warnings.warn("Note that Indexes are not transferred over as columns. Only actual columns are transferred")
print(sas.lastlog())
7 The SAS System 16:51 Thursday, October 13, 2022
32 ;';";*/;
33
34 data null; retain libref; retain cobs 1;
35 set sashelp.vlibnam end=last;
36 if cobs EQ 1 then
37 put "LIBREFSSTART=";
38 cobs = 2;
39 if libref NE libname then
40 put %upcase("lib=") libname %upcase('libEND=');
41 libref = libname;
42 if last then
43 put "LIBREFSEND=";
44 run;
LIBREFSSTART=
LIB=WORK LIBEND=
LIB=DATAMART LIBEND=
LIB=SASDATA LIBEND=
LIB=O21_0822 LIBEND=
LIB=O22_0822 LIBEND=
LIB=REL0822 LIBEND=
LIB=O12_0122 LIBEND=
LIB=O13_0122 LIBEND=
LIB=O14_0122 LIBEND=
LIB=O15_0122 LIBEND=
LIB=O16_0122 LIBEND=
LIB=O17_0122 LIBEND=
LIB=O18_0122 LIBEND=
LIB=O19_0122 LIBEND=
LIB=O20_0122 LIBEND=
LIB=O21_0122 LIBEND=
LIB=DGT0622 LIBEND=
LIB=IFRHIS4 LIBEND=
LIB=IFRHIS LIBEND=
LIB=IFRACT LIBEND=
LIB=O11_0121 LIBEND=
LIB=O12_0121 LIBEND=
LIB=O13_0121 LIBEND=
LIB=O14_0121 LIBEND=
LIB=O15_0121 LIBEND=
LIB=O16_0121 LIBEND=
LIB=O17_0121 LIBEND=
LIB=O18_0121 LIBEND=
LIB=O19_0121 LIBEND=
LIB=TEST LIBEND=
LIB=OSA2020 LIBEND=
LIB=OSA2021 LIBEND=
LIB=NBO2022 LIBEND=
LIB=GEO1221 LIBEND=
LIB=O20_0821 LIBEND=
LIB=O21_0821 LIBEND=
LIB=OSADTM LIBEND=
LIB=PCRM2022 LIBEND=
LIB=O20_0121 LIBEND=
LIB=PCRH2021 LIBEND=
LIB=PCRH2022 LIBEND=
LIB=PCRM2021 LIBEND=
LIB=SIMWRK LIBEND=
LIB=SM9WRK LIBEND=
LIB=SVUWRK LIBEND=
LIB=SSTWRK LIBEND=
LIB=SNHWRK LIBEND=
LIB=SA4WRK LIBEND=
LIB=SA2WRK LIBEND=
LIB=SAP0821 LIBEND=
LIB=SAP0721 LIBEND=
LIB=STPSAMP LIBEND=
LIB=SASHELP LIBEND=
LIB=MAPS LIBEND=
LIB=MAPSSAS LIBEND=
LIB=MAPSGFK LIBEND=
LIB=SASUSER LIBEND=
LIBREFSEND=
NOTE: There were 952 observations read from the data set SASHELP.VLIBNAM.
NOTE: DATA statement used (Total process time):
real time 0.13 seconds
cpu time 0.03 seconds
45
46
47 ;';";*/;
48
8 The SAS System 16:51 Thursday, October 13, 2022
49
50 %put E3969440A681A2408885998500000004;
E3969440A681A2408885998500000004
51
9 The SAS System 16:51 Thursday, October 13, 2022
52 data datamart.'sinistres_auth'n;
53 length 'SIN'n $10 'DBT_VALIDITE'n 8 'FIN_VALIDITE'n 8 'CR'n $3 'DATE_OUV'n 8 'DATE_SURV'n 8 'DATE_CLOS'n 8 'AN_RECOLTE'n
53 ! $3 'EVTNAT'n $5 'TYPE_GES'n $3 'ETATSIN'n $1 'LIEU'n $5 'PJ'n 8 'TOPMIG'n $3 'DATEOBS'n 8 'EVAL_BRUTE'n 8
53 ! 'EVAL_RECOURS'n 8 'EVAL_NET'n 8 'REG_PRINC'n 8 'REG_FRAIS'n 8 'REG_HONOR'n 8 'REG_RECOURS'n 8 'REG_NET'n 8 'RES_NET'n 8
53 ! 'GESTE_MONT'n 8 'ANNEE_SURV'n $4 'ANNEE_OUV'n $4 'ANNEE_CLOS'n $4 'ANNEE_OBS'n $4 'PRODUIT'n $2 'EVAL_REM'n 8 'CHARGE'n 8
53 ! ;
54 format 'DBT_VALIDITE'n E8601DT26.6 'FIN_VALIDITE'n E8601DT26.6 'DATE_OUV'n E8601DT26.6 'DATE_SURV'n E8601DT26.6
54 ! 'DATE_CLOS'n E8601DT26.6 'DATEOBS'n E8601DT26.6 ;
55 infile datalines delimiter='03'x STOPOVER;
56 input @;
57 if infile = '' then delete;
58 else do;
59 input 'SIN'n ;
60 input 'DBT_VALIDITE'n :B8601DT26.6 ;
61 input 'FIN_VALIDITE'n :B8601DT26.6 ;
62 input 'CR'n ;
63 input 'DATE_OUV'n :B8601DT26.6 ;
64 input 'DATE_SURV'n :B8601DT26.6 ;
65 input 'DATE_CLOS'n :B8601DT26.6 ;
66 input 'AN_RECOLTE'n ;
67 input 'EVTNAT'n ;
68 input 'TYPE_GES'n ;
69 input 'ETATSIN'n ;
70 input 'LIEU'n ;
71 input 'PJ'n ;
72 input 'TOPMIG'n ;
73 input 'DATEOBS'n :B8601DT26.6 ;
74 input 'EVAL_BRUTE'n ;
75 input 'EVAL_RECOURS'n ;
76 input 'EVAL_NET'n ;
77 input 'REG_PRINC'n ;
78 input 'REG_FRAIS'n ;
79 input 'REG_HONOR'n ;
80 input 'REG_RECOURS'n ;
81 input 'REG_NET'n ;
82 input 'RES_NET'n ;
83 input 'GESTE_MONT'n ;
84 input 'ANNEE_SURV'n ;
85 input 'ANNEE_OUV'n ;
86 input 'ANNEE_CLOS'n ;
87 input 'ANNEE_OBS'n ;
88 input 'PRODUIT'n ;
89 input 'EVAL_REM'n ;
90 input 'CHARGE'n ;
91 'SIN'n = translate('SIN'n, '0a'X, '01'x);
92 'SIN'n = translate('SIN'n, '0D'x, '02'x );
93 'CR'n = translate('CR'n, '0a'X, '01'x);
94 'CR'n = translate('CR'n, '0D'x, '02'x );
95 'AN_RECOLTE'n = translate('AN_RECOLTE'n, '0a'X, '01'x);
96 'AN_RECOLTE'n = translate('AN_RECOLTE'n, '0D'x, '02'x );
97 'EVTNAT'n = translate('EVTNAT'n, '0a'X, '01'x);
98 'EVTNAT'n = translate('EVTNAT'n, '0D'x, '02'x );
99 'TYPE_GES'n = translate('TYPE_GES'n, '0a'X, '01'x);
100 'TYPE_GES'n = translate('TYPE_GES'n, '0D'x, '02'x );
101 'ETATSIN'n = translate('ETATSIN'n, '0a'X, '01'x);
102 'ETATSIN'n = translate('ETATSIN'n, '0D'x, '02'x );
103 'LIEU'n = translate('LIEU'n, '0a'X, '01'x);
104 'LIEU'n = translate('LIEU'n, '0D'x, '02'x );
105 'TOPMIG'n = translate('TOPMIG'n, '0a'X, '01'x);
106 'TOPMIG'n = translate('TOPMIG'n, '0D'x, '02'x );
107 'ANNEE_SURV'n = translate('ANNEE_SURV'n, '0a'X, '01'x);
108 'ANNEE_SURV'n = translate('ANNEE_SURV'n, '0D'x, '02'x );
109 'ANNEE_OUV'n = translate('ANNEE_OUV'n, '0a'X, '01'x);
110 'ANNEE_OUV'n = translate('ANNEE_OUV'n, '0D'x, '02'x );
111 'ANNEE_CLOS'n = translate('ANNEE_CLOS'n, '0a'X, '01'x);
112 'ANNEE_CLOS'n = translate('ANNEE_CLOS'n, '0D'x, '02'x );
113 'ANNEE_OBS'n = translate('ANNEE_OBS'n, '0a'X, '01'x);
114 'ANNEE_OBS'n = translate('ANNEE_OBS'n, '0D'x, '02'x );
115 'PRODUIT'n = translate('PRODUIT'n, '0a'X, '01'x);
116 'PRODUIT'n = translate('PRODUIT'n, '0D'x, '02'x );
117 ;
118 end;
119 datalines4;
10 The SAS System 16:51 Thursday, October 13, 2022
NOTE: DATA statement used (Total process time):
real time 0.16 seconds
cpu time 0.14 seconds
NOTE: The data set DATAMART.SINISTRES_AUTH has 1000 observations and 32 variables.
NOTE: Compressing data set DATAMART.SINISTRES_AUTH decreased size by 0.00 percent.
Compressed is 4 pages; un-compressed would require 4 pages.
32139 ;;;;
32140 ;;;;
32141
11 The SAS System 16:51 Thursday, October 13, 2022
32142 ;';";*/;
32143 quit;
32144
32145 ;';";*/;
32146
12 The SAS System 16:51 Thursday, October 13, 2022
32147
32148 %put E3969440A681A2408885998500000005;
E3969440A681A2408885998500000005
32149
13 The SAS System 16:51 Thursday, October 13, 2022
32150 ;';";*/;
32151 data null; e = exist("datamart.'sinistres_auth'n");
32152 v = exist("datamart.'sinistres_auth'n", 'VIEW');
32153 if e or v then e = 1;
32154 te='TABLE_EXISTS='; put te e;run;
TABLE_EXISTS= 1
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
32155
32156
32157 ;';";*/;
32158
14 The SAS System 16:51 Thursday, October 13, 2022
32159
32160 %put E3969440A681A2408885998500000006;
E3969440A681A2408885998500000006
32161
15 The SAS System 16:51 Thursday, October 13, 2022
32162 ;';";*/;
32163 %let engine=BAD;
32164 proc sql;
32164 ! select distinct engine into :engine from sashelp.VLIBNAM where libname = 'DATAMART';
32164 ! ;%put engstart=&engine
32164 ! engend=;
engstart=V9 engend=
32165 quit;
NOTE: The PROCEDURE SQL printed page 1.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.03 seconds
32166
32167 ;';";*/;
32168
16 The SAS System 16:51 Thursday, October 13, 2022
32169
32170 %put E3969440A681A2408885998500000007;
E3969440A681A2408885998500000007
32171
df.shape
Out[15]: (18437883, 32)
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 18437883 entries, 0 to 380710
Data columns (total 32 columns):
0 SIN object
1 DBT_VALIDITE datetime64[ns]
2 FIN_VALIDITE datetime64[ns]
3 CR object
4 DATE_OUV datetime64[ns]
5 DATE_SURV datetime64[ns]
6 DATE_CLOS datetime64[ns]
7 AN_RECOLTE object
8 EVTNAT object
9 TYPE_GES object
10 ETATSIN object
11 LIEU object
12 PJ float64
13 TOPMIG object
14 DATEOBS datetime64[ns]
15 EVAL_BRUTE float64
16 EVAL_RECOURS float64
17 EVAL_NET float64
18 REG_PRINC float64
19 REG_FRAIS float64
20 REG_HONOR float64
21 REG_RECOURS float64
22 REG_NET float64
23 RES_NET float64
24 GESTE_MONT float64
25 ANNEE_SURV object
26 ANNEE_OUV object
27 ANNEE_CLOS object
28 ANNEE_OBS object
29 PRODUIT object
30 EVAL_REM float64
31 CHARGE float64
dtypes: datetime64ns, float64(13), object(13)
memory usage: 4.5+ GB
sas.df_char_lengths(df[:1000])
Out[17]:
{'SIN': 10,
'CR': 3,
'AN_RECOLTE': 3,
'EVTNAT': 5,
'TYPE_GES': 3,
'ETATSIN': 1,
'LIEU': 5,
'TOPMIG': 3,
'ANNEE_SURV': 4,
'ANNEE_OUV': 4,
'ANNEE_CLOS': 4,
'ANNEE_OBS': 4,
'PRODUIT': 2}
sd.columnInfo()
Out[18]:
Member Num Variable Type Len Pos Format
0 DATAMART.SINISTRES_AUTH 28.0 ANNEE_CLOS Char 4.0 193.0 NaN
1 DATAMART.SINISTRES_AUTH 29.0 ANNEE_OBS Char 4.0 197.0 NaN
2 DATAMART.SINISTRES_AUTH 27.0 ANNEE_OUV Char 4.0 189.0 NaN
3 DATAMART.SINISTRES_AUTH 26.0 ANNEE_SURV Char 4.0 185.0 NaN
4 DATAMART.SINISTRES_AUTH 8.0 AN_RECOLTE Char 3.0 165.0 NaN
5 DATAMART.SINISTRES_AUTH 32.0 CHARGE Num 8.0 144.0 NaN
6 DATAMART.SINISTRES_AUTH 4.0 CR Char 3.0 162.0 NaN
7 DATAMART.SINISTRES_AUTH 15.0 DATEOBS Num 8.0 48.0 E8601DT26.6
8 DATAMART.SINISTRES_AUTH 7.0 DATE_CLOS Num 8.0 32.0 E8601DT26.6
9 DATAMART.SINISTRES_AUTH 5.0 DATE_OUV Num 8.0 16.0 E8601DT26.6
10 DATAMART.SINISTRES_AUTH 6.0 DATE_SURV Num 8.0 24.0 E8601DT26.6
11 DATAMART.SINISTRES_AUTH 2.0 DBT_VALIDITE Num 8.0 0.0 E8601DT26.6
12 DATAMART.SINISTRES_AUTH 11.0 ETATSIN Char 1.0 176.0 NaN
13 DATAMART.SINISTRES_AUTH 16.0 EVAL_BRUTE Num 8.0 56.0 NaN
14 DATAMART.SINISTRES_AUTH 18.0 EVAL_NET Num 8.0 72.0 NaN
15 DATAMART.SINISTRES_AUTH 17.0 EVAL_RECOURS Num 8.0 64.0 NaN
16 DATAMART.SINISTRES_AUTH 31.0 EVAL_REM Num 8.0 136.0 NaN
17 DATAMART.SINISTRES_AUTH 9.0 EVTNAT Char 5.0 168.0 NaN
18 DATAMART.SINISTRES_AUTH 3.0 FIN_VALIDITE Num 8.0 8.0 E8601DT26.6
19 DATAMART.SINISTRES_AUTH 25.0 GESTE_MONT Num 8.0 128.0 NaN
20 DATAMART.SINISTRES_AUTH 12.0 LIEU Char 5.0 177.0 NaN
21 DATAMART.SINISTRES_AUTH 13.0 PJ Num 8.0 40.0 NaN
22 DATAMART.SINISTRES_AUTH 30.0 PRODUIT Char 2.0 201.0 NaN
23 DATAMART.SINISTRES_AUTH 20.0 REG_FRAIS Num 8.0 88.0 NaN
24 DATAMART.SINISTRES_AUTH 21.0 REG_HONOR Num 8.0 96.0 NaN
25 DATAMART.SINISTRES_AUTH 23.0 REG_NET Num 8.0 112.0 NaN
26 DATAMART.SINISTRES_AUTH 19.0 REG_PRINC Num 8.0 80.0 NaN
27 DATAMART.SINISTRES_AUTH 22.0 REG_RECOURS Num 8.0 104.0 NaN
28 DATAMART.SINISTRES_AUTH 24.0 RES_NET Num 8.0 120.0 NaN
29 DATAMART.SINISTRES_AUTH 1.0 SIN Char 10.0 152.0 NaN
30 DATAMART.SINISTRES_AUTH 14.0 TOPMIG Char 3.0 182.0 NaN
31 DATAMART.SINISTRES_AUTH 10.0 TYPE_GES Char 3.0 173.0 NaN
sd.obs()
Out[19]: 1000
sd.contents(results='text')
The SAS System 16:51 Thursday, October 13, 2022 3
The CONTENTS Procedure
Data Set Name DATAMART.SINISTRES_AUTH Observations 1000
Member Type DATA Variables 32
Engine V9 Indexes 0
Created 13/10/2022 16:52:02 Observation Length 203
Last Modified 13/10/2022 16:52:02 Deleted Observations 0
Protection Compressed CHAR
Data Set Type Reuse Space NO
Label Point to Observations YES
Data Representation WINDOWS_64 Sorted NO
Encoding wlatin1 Western (Windows)
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 4
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename H:\DataProj\DFICMS\02_Datamarts\202204\sinistres_auth.sas7bdat
Release Created 9.0401M7
Host Created X64_SRV19
Owner Name CA-PACIFICA\aurelien.thiery
File Size 320KB
File Size (bytes) 327680
Alphabetic List of Variables and Attributes
# Variable Type Len Format
28 ANNEE_CLOS Char 4
29 ANNEE_OBS Char 4
27 ANNEE_OUV Char 4
26 ANNEE_SURV Char 4
8 AN_RECOLTE Char 3
32 CHARGE Num 8
4 CR Char 3
15 DATEOBS Num 8 E8601DT26.6
7 DATE_CLOS Num 8 E8601DT26.6
5 DATE_OUV Num 8 E8601DT26.6
6 DATE_SURV Num 8 E8601DT26.6
2 DBT_VALIDITE Num 8 E8601DT26.6
11 ETATSIN Char 1
16 EVAL_BRUTE Num 8
18 EVAL_NET Num 8
17 EVAL_RECOURS Num 8
31 EVAL_REM Num 8
9 EVTNAT Char 5
3 FIN_VALIDITE Num 8 E8601DT26.6
25 GESTE_MONT Num 8
12 LIEU Char 5
13 PJ Num 8
30 PRODUIT Char 2
20 REG_FRAIS Num 8
21 REG_HONOR Num 8
23 REG_NET Num 8
19 REG_PRINC Num 8
22 REG_RECOURS Num 8
24 RES_NET Num 8
1 SIN Char 10
14 TOPMIG Char 3
10 TYPE_GES Char 3
print(sas.saslog())
1 The SAS System 16:51 Thursday, October 13, 2022
NOTE: Copyright (c) 2016 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.4 (TS1M7)
Licensed to PACIFICA - SERVEUR DE DEVELOPPEMENT, Site 50110812.
NOTE: This session is executing on the X64_SRV19 platform.
NOTE: Analytical products:
SAS/STAT 15.2
NOTE: Additional host information:
X64_SRV19 WIN 10.0.17763 Server
NOTE: SAS Initialization used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
NOTE: The autoexec file, I:\SASConfig\Lev1\Infocentre\WorkspaceServer\autoexec.sas, was executed at server initialization.
1 ;';";*/;
2 options svgtitle='svgtitle'; options validvarname=any validmemname=extend pagesize=max nosyntaxcheck; ods graphics on;
3
4 ;';";*/;
5
2 The SAS System 16:51 Thursday, October 13, 2022
6
7 %put E3969440A681A2408885998500000001;
E3969440A681A2408885998500000001
8
3 The SAS System 16:51 Thursday, October 13, 2022
9 ;';";*/;
10 data null; length x $ 4096;
11 x = resolve('%sysfunc(pathname(work))'); put 'WORKPATH=' x 'WORKPATHEND=';
12 x = resolve('&SYSENCODING'); put 'ENCODING=' x 'ENCODINGEND=';
13 x = resolve('&SYSVLONG4'); put 'SYSVLONG=' x 'SYSVLONGEND=';
14 x = resolve('&SYSJOBID'); put 'SYSJOBID=' x 'SYSJOBIDEND=';
15 x = resolve('&SYSSCP'); put 'SYSSCP=' x 'SYSSCPEND=';
16 run;
WORKPATH=W:\WorkSAS_TD23084S00SAS017\Prc2 WORKPATHEND=
ENCODING=wlatin1 ENCODINGEND=
SYSVLONG=9.04.01M7P08052020 SYSVLONGEND=
SYSJOBID=23084 SYSJOBIDEND=
SYSSCP=WIN SYSSCPEND=
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
17
18
19 ;';";*/;
20
4 The SAS System 16:51 Thursday, October 13, 2022
21
22 %put E3969440A681A2408885998500000002;
E3969440A681A2408885998500000002
23
5 The SAS System 16:51 Thursday, October 13, 2022
24 ;';";*/;
25 libname datamart 'H:\DataProj\DFICMS\02_Datamarts\202204' ;
NOTE: Libref DATAMART was successfully assigned as follows:
Engine: V9
Physical Name: H:\DataProj\DFICMS\02_Datamarts\202204
26
27 ;';";*/;
28
6 The SAS System 16:51 Thursday, October 13, 2022
29
30 %put E3969440A681A2408885998500000003;
E3969440A681A2408885998500000003
31
7 The SAS System 16:51 Thursday, October 13, 2022
32 ;';";*/;
33
34 data null; retain libref; retain cobs 1;
35 set sashelp.vlibnam end=last;
36 if cobs EQ 1 then
37 put "LIBREFSSTART=";
38 cobs = 2;
39 if libref NE libname then
40 put %upcase("lib=") libname %upcase('libEND=');
41 libref = libname;
42 if last then
43 put "LIBREFSEND=";
44 run;
LIBREFSSTART=
LIB=WORK LIBEND=
LIB=DATAMART LIBEND=
LIB=SASDATA LIBEND=
LIB=O21_0822 LIBEND=
LIB=O22_0822 LIBEND=
LIB=REL0822 LIBEND=
LIB=O12_0122 LIBEND=
LIB=O13_0122 LIBEND=
LIB=O14_0122 LIBEND=
LIB=O15_0122 LIBEND=
LIB=O16_0122 LIBEND=
LIB=O17_0122 LIBEND=
LIB=O18_0122 LIBEND=
LIB=O19_0122 LIBEND=
LIB=O20_0122 LIBEND=
LIB=O21_0122 LIBEND=
LIB=DGT0622 LIBEND=
LIB=IFRHIS4 LIBEND=
LIB=IFRHIS LIBEND=
LIB=IFRACT LIBEND=
LIB=O11_0121 LIBEND=
LIB=O12_0121 LIBEND=
LIB=O13_0121 LIBEND=
LIB=O14_0121 LIBEND=
LIB=O15_0121 LIBEND=
LIB=O16_0121 LIBEND=
LIB=O17_0121 LIBEND=
LIB=O18_0121 LIBEND=
LIB=O19_0121 LIBEND=
LIB=TEST LIBEND=
LIB=OSA2020 LIBEND=
LIB=OSA2021 LIBEND=
LIB=NBO2022 LIBEND=
LIB=GEO1221 LIBEND=
LIB=O20_0821 LIBEND=
LIB=O21_0821 LIBEND=
LIB=OSADTM LIBEND=
LIB=PCRM2022 LIBEND=
LIB=O20_0121 LIBEND=
LIB=PCRH2021 LIBEND=
LIB=PCRH2022 LIBEND=
LIB=PCRM2021 LIBEND=
LIB=SIMWRK LIBEND=
LIB=SM9WRK LIBEND=
LIB=SVUWRK LIBEND=
LIB=SSTWRK LIBEND=
LIB=SNHWRK LIBEND=
LIB=SA4WRK LIBEND=
LIB=SA2WRK LIBEND=
LIB=SAP0821 LIBEND=
LIB=SAP0721 LIBEND=
LIB=STPSAMP LIBEND=
LIB=SASHELP LIBEND=
LIB=MAPS LIBEND=
LIB=MAPSSAS LIBEND=
LIB=MAPSGFK LIBEND=
LIB=SASUSER LIBEND=
LIBREFSEND=
NOTE: There were 952 observations read from the data set SASHELP.VLIBNAM.
NOTE: DATA statement used (Total process time):
real time 0.13 seconds
cpu time 0.03 seconds
45
46
47 ;';";*/;
48
8 The SAS System 16:51 Thursday, October 13, 2022
49
50 %put E3969440A681A2408885998500000004;
E3969440A681A2408885998500000004
51
9 The SAS System 16:51 Thursday, October 13, 2022
52 data datamart.'sinistres_auth'n;
53 length 'SIN'n $10 'DBT_VALIDITE'n 8 'FIN_VALIDITE'n 8 'CR'n $3 'DATE_OUV'n 8 'DATE_SURV'n 8 'DATE_CLOS'n 8 'AN_RECOLTE'n
53 ! $3 'EVTNAT'n $5 'TYPE_GES'n $3 'ETATSIN'n $1 'LIEU'n $5 'PJ'n 8 'TOPMIG'n $3 'DATEOBS'n 8 'EVAL_BRUTE'n 8
53 ! 'EVAL_RECOURS'n 8 'EVAL_NET'n 8 'REG_PRINC'n 8 'REG_FRAIS'n 8 'REG_HONOR'n 8 'REG_RECOURS'n 8 'REG_NET'n 8 'RES_NET'n 8
53 ! 'GESTE_MONT'n 8 'ANNEE_SURV'n $4 'ANNEE_OUV'n $4 'ANNEE_CLOS'n $4 'ANNEE_OBS'n $4 'PRODUIT'n $2 'EVAL_REM'n 8 'CHARGE'n 8
53 ! ;
54 format 'DBT_VALIDITE'n E8601DT26.6 'FIN_VALIDITE'n E8601DT26.6 'DATE_OUV'n E8601DT26.6 'DATE_SURV'n E8601DT26.6
54 ! 'DATE_CLOS'n E8601DT26.6 'DATEOBS'n E8601DT26.6 ;
55 infile datalines delimiter='03'x STOPOVER;
56 input @;
57 if infile = '' then delete;
58 else do;
59 input 'SIN'n ;
60 input 'DBT_VALIDITE'n :B8601DT26.6 ;
61 input 'FIN_VALIDITE'n :B8601DT26.6 ;
62 input 'CR'n ;
63 input 'DATE_OUV'n :B8601DT26.6 ;
64 input 'DATE_SURV'n :B8601DT26.6 ;
65 input 'DATE_CLOS'n :B8601DT26.6 ;
66 input 'AN_RECOLTE'n ;
67 input 'EVTNAT'n ;
68 input 'TYPE_GES'n ;
69 input 'ETATSIN'n ;
70 input 'LIEU'n ;
71 input 'PJ'n ;
72 input 'TOPMIG'n ;
73 input 'DATEOBS'n :B8601DT26.6 ;
74 input 'EVAL_BRUTE'n ;
75 input 'EVAL_RECOURS'n ;
76 input 'EVAL_NET'n ;
77 input 'REG_PRINC'n ;
78 input 'REG_FRAIS'n ;
79 input 'REG_HONOR'n ;
80 input 'REG_RECOURS'n ;
81 input 'REG_NET'n ;
82 input 'RES_NET'n ;
83 input 'GESTE_MONT'n ;
84 input 'ANNEE_SURV'n ;
85 input 'ANNEE_OUV'n ;
86 input 'ANNEE_CLOS'n ;
87 input 'ANNEE_OBS'n ;
88 input 'PRODUIT'n ;
89 input 'EVAL_REM'n ;
90 input 'CHARGE'n ;
91 'SIN'n = translate('SIN'n, '0a'X, '01'x);
92 'SIN'n = translate('SIN'n, '0D'x, '02'x );
93 'CR'n = translate('CR'n, '0a'X, '01'x);
94 'CR'n = translate('CR'n, '0D'x, '02'x );
95 'AN_RECOLTE'n = translate('AN_RECOLTE'n, '0a'X, '01'x);
96 'AN_RECOLTE'n = translate('AN_RECOLTE'n, '0D'x, '02'x );
97 'EVTNAT'n = translate('EVTNAT'n, '0a'X, '01'x);
98 'EVTNAT'n = translate('EVTNAT'n, '0D'x, '02'x );
99 'TYPE_GES'n = translate('TYPE_GES'n, '0a'X, '01'x);
100 'TYPE_GES'n = translate('TYPE_GES'n, '0D'x, '02'x );
101 'ETATSIN'n = translate('ETATSIN'n, '0a'X, '01'x);
102 'ETATSIN'n = translate('ETATSIN'n, '0D'x, '02'x );
103 'LIEU'n = translate('LIEU'n, '0a'X, '01'x);
104 'LIEU'n = translate('LIEU'n, '0D'x, '02'x );
105 'TOPMIG'n = translate('TOPMIG'n, '0a'X, '01'x);
106 'TOPMIG'n = translate('TOPMIG'n, '0D'x, '02'x );
107 'ANNEE_SURV'n = translate('ANNEE_SURV'n, '0a'X, '01'x);
108 'ANNEE_SURV'n = translate('ANNEE_SURV'n, '0D'x, '02'x );
109 'ANNEE_OUV'n = translate('ANNEE_OUV'n, '0a'X, '01'x);
110 'ANNEE_OUV'n = translate('ANNEE_OUV'n, '0D'x, '02'x );
111 'ANNEE_CLOS'n = translate('ANNEE_CLOS'n, '0a'X, '01'x);
112 'ANNEE_CLOS'n = translate('ANNEE_CLOS'n, '0D'x, '02'x );
113 'ANNEE_OBS'n = translate('ANNEE_OBS'n, '0a'X, '01'x);
114 'ANNEE_OBS'n = translate('ANNEE_OBS'n, '0D'x, '02'x );
115 'PRODUIT'n = translate('PRODUIT'n, '0a'X, '01'x);
116 'PRODUIT'n = translate('PRODUIT'n, '0D'x, '02'x );
117 ;
118 end;
119 datalines4;
10 The SAS System 16:51 Thursday, October 13, 2022
NOTE: DATA statement used (Total process time):
real time 0.16 seconds
cpu time 0.14 seconds
NOTE: The data set DATAMART.SINISTRES_AUTH has 1000 observations and 32 variables.
NOTE: Compressing data set DATAMART.SINISTRES_AUTH decreased size by 0.00 percent.
Compressed is 4 pages; un-compressed would require 4 pages.
32139 ;;;;
32140 ;;;;
32141
11 The SAS System 16:51 Thursday, October 13, 2022
32142 ;';";*/;
32143 quit;
32144
32145 ;';";*/;
32146
12 The SAS System 16:51 Thursday, October 13, 2022
32147
32148 %put E3969440A681A2408885998500000005;
E3969440A681A2408885998500000005
32149
13 The SAS System 16:51 Thursday, October 13, 2022
32150 ;';";*/;
32151 data null; e = exist("datamart.'sinistres_auth'n");
32152 v = exist("datamart.'sinistres_auth'n", 'VIEW');
32153 if e or v then e = 1;
32154 te='TABLE_EXISTS='; put te e;run;
TABLE_EXISTS= 1
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
32155
32156
32157 ;';";*/;
32158
14 The SAS System 16:51 Thursday, October 13, 2022
32159
32160 %put E3969440A681A2408885998500000006;
E3969440A681A2408885998500000006
32161
15 The SAS System 16:51 Thursday, October 13, 2022
32162 ;';";*/;
32163 %let engine=BAD;
32164 proc sql;
32164 ! select distinct engine into :engine from sashelp.VLIBNAM where libname = 'DATAMART';
32164 ! ;%put engstart=&engine
32164 ! engend=;
engstart=V9 engend=
32165 quit;
NOTE: The PROCEDURE SQL printed page 1.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.03 seconds
32166
32167 ;';";*/;
32168
16 The SAS System 16:51 Thursday, October 13, 2022
32169
32170 %put E3969440A681A2408885998500000007;
E3969440A681A2408885998500000007
32171
17 The SAS System 16:51 Thursday, October 13, 2022
32172 ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg style=HTMLBlue;
32172 ! ods graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1
32173 ;';";*/;
32174 proc contents data=datamart.'sinistres_auth'n ;ods output Variables=work._variables ;run;
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.04 seconds
cpu time 0.03 seconds
NOTE: The data set WORK._VARIABLES has 32 observations and 7 variables.
NOTE: Compressing data set WORK._VARIABLES increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
32175
32176 ;';";*/;
32177 ods html5 (id=saspy_internal) close;ods listing;
32178
18 The SAS System 16:51 Thursday, October 13, 2022
32179
32180 %put E3969440A681A2408885998500000008;
E3969440A681A2408885998500000008
32181
19 The SAS System 16:51 Thursday, October 13, 2022
32182 ;';";*/;
32183 data null; e = exist("work.'_variables'n");
32184 v = exist("work.'_variables'n", 'VIEW');
32185 if e or v then e = 1;
32186 te='TABLE_EXISTS='; put te e;run;
TABLE_EXISTS= 1
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
32187
32188
32189 ;';";*/;
32190
20 The SAS System 16:51 Thursday, October 13, 2022
32191
32192 %put E3969440A681A2408885998500000009;
E3969440A681A2408885998500000009
32193
21 The SAS System 16:51 Thursday, October 13, 2022
32194 ;';";*/;
32195 data work.sasdata2dataframe / view=work.sasdata2dataframe; set work.'_variables'n ;run;
NOTE: DATA STEP view saved on file WORK.SASDATA2DATAFRAME.
NOTE: A stored DATA STEP view cannot run under a different operating system.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
32196 data null; file LOG; d = open('work.sasdata2dataframe');
32197 length var $256;
32198 lrecl = attrn(d, 'LRECL'); nvars = attrn(d, 'NVARS');
32199 lr='LRECL='; vn='VARNUMS='; vl='VARLIST='; vt='VARTYPE=';
32200 put lr lrecl; put vn nvars; put vl;
32201 do i = 1 to nvars; var = compress(varname(d, i), '00'x); put var; end;
32202 put vt;
32203 do i = 1 to nvars; var = vartype(d, i); put var; end;
32204 run;
LRECL= 327
VARNUMS= 7
VARLIST=
Member
Num
Variable
Type
Len
Pos
Format
VARTYPE=
C
N
C
C
N
N
C
NOTE: View WORK.SASDATA2DATAFRAME.VIEW used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
32205
32206 ;';";*/;
32207
22 The SAS System 16:51 Thursday, October 13, 2022
32208
32209 %put E3969440A681A2408885998500000011;
E3969440A681A2408885998500000011
32210
23 The SAS System 16:51 Thursday, October 13, 2022
32211 ;';";*/;
32212 proc delete data=work.sasdata2dataframe(memtype=view);run;
NOTE: Deleting WORK.SASDATA2DATAFRAME (memtype=VIEW).
NOTE: PROCEDURE DELETE used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
32213 data work._n_u_ll;output;run;
NOTE: Compression was disabled for data set WORK._N_U_LL because compression overhead would increase the size of the data set.
NOTE: The data set WORK._N_U_LL has 1 observations and 0 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
32214 data null; set work._n_u_ll work.'_variables'n ;put 'FMT_CATS=';
32215 _tom = vformatn('Member'n);put _tom;
32216 _tom = vformatn('Num'n);put _tom;
32217 _tom = vformatn('Variable'n);put _tom;
32218 _tom = vformatn('Type'n);put _tom;
32219 _tom = vformatn('Len'n);put _tom;
32220 _tom = vformatn('Pos'n);put _tom;
32221 _tom = vformatn('Format'n);put _tom;
32222 stop;
32223 run;
FMT_CATS=
$
F
$
$
F
BEST
$
NOTE: There were 1 observations read from the data set WORK._N_U_LL.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
32224 proc delete data=work._n_u_ll;run;
NOTE: Deleting WORK._N_U_LL (memtype=DATA).
NOTE: PROCEDURE DELETE used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
32225
32226 ;';";*/;
32227
24 The SAS System 16:51 Thursday, October 13, 2022
32228
32229 %put E3969440A681A2408885998500000012;
E3969440A681A2408885998500000012
32230
25 The SAS System 16:51 Thursday, October 13, 2022
32231 data null; set work.'_variables'n ;
32232 format 'Num'n best32.; format 'Len'n best32.; format 'Pos'n best32.;
32233 file _tomods1 lrecl=1048576 dlm='02'x recfm=v termstr=NL encoding='utf-8';
32234 'Member'n = translate('Member'n, '2020'x, '0102'x); if missing('Num'n) then 'Num'n = .; 'Variable'n =
32234 ! translate('Variable'n, '2020'x, '0102'x); 'Type'n = translate('Type'n, '2020'x, '0102'x); if missing('Len'n) then 'Len'n
32234 ! = .; if missing('Pos'n) then 'Pos'n = .; 'Format'n = translate('Format'n, '2020'x, '0102'x);
32235 put 'Member'n 'Num'n 'Variable'n 'Type'n 'Len'n 'Pos'n 'Format'n '01'x;
32236 run;
NOTE: The file _TOMODS1 is:
Filename=W:\WorkSAS\_TD23084_S00SAS017_\Prc2\_tomods1,
RECFM=V,LRECL=4194304,File Size (bytes)=0,
Last Modified=13 octobre 2022 16 h 53,
Create Time=13 octobre 2022 16 h 53
NOTE: 32 records were written to the file _TOMODS1.
The minimum record length was 42.
The maximum record length was 60.
NOTE: There were 32 observations read from the data set WORK._VARIABLES.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
32237
26 The SAS System 16:51 Thursday, October 13, 2022
32238
32239
27 The SAS System 16:51 Thursday, October 13, 2022
32240
32241 %put E3969440A681A2408885998500000010;
E3969440A681A2408885998500000010
32242
28 The SAS System 16:51 Thursday, October 13, 2022
32243 ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg style=HTMLBlue;
32243 ! ods graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1
32244 ;';";*/;
32245 proc delete data=work._variables; run;
NOTE: Deleting WORK._VARIABLES (memtype=DATA).
NOTE: PROCEDURE DELETE used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
32246
32247 ;';";*/;
32248 ods html5 (id=saspy_internal) close;ods listing;
32249
29 The SAS System 16:51 Thursday, October 13, 2022
32250
32251 %put E3969440A681A2408885998500000013;
E3969440A681A2408885998500000013
32252
30 The SAS System 16:51 Thursday, October 13, 2022
32253 ;';";*/;
32254 data null; e = exist("datamart.'sinistres_auth'n");
32255 v = exist("datamart.'sinistres_auth'n", 'VIEW');
32256 if e or v then e = 1;
32257 te='TABLE_EXISTS='; put te e;run;
TABLE_EXISTS= 1
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
32258
32259
32260 ;';";*/;
32261
31 The SAS System 16:51 Thursday, October 13, 2022
32262
32263 %put E3969440A681A2408885998500000014;
E3969440A681A2408885998500000014
32264
32 The SAS System 16:51 Thursday, October 13, 2022
32265 ;';";*/;
32266 %let lastobs=-1;
32267 proc sql;
32267 ! select count(*) format best32. into :lastobs from datamart.'sinistres_auth'n ;
32267 ! %put lastobs=&lastobs lastobsend=;
lastobs= 1000 lastobsend=
32268 quit;
NOTE: The PROCEDURE SQL printed page 2.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
32269
32270 ;';";*/;
32271
33 The SAS System 16:51 Thursday, October 13, 2022
32272
32273 %put E3969440A681A2408885998500000015;
E3969440A681A2408885998500000015
32274
34 The SAS System 16:51 Thursday, October 13, 2022
32275 ;';";*/;
32276 data null; e = exist("datamart.'sinistres_auth'n");
32277 v = exist("datamart.'sinistres_auth'n", 'VIEW');
32278 if e or v then e = 1;
32279 te='TABLE_EXISTS='; put te e;run;
TABLE_EXISTS= 1
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
32280
32281
32282 ;';";*/;
32283
35 The SAS System 16:51 Thursday, October 13, 2022
32284
32285 %put E3969440A681A2408885998500000016;
E3969440A681A2408885998500000016
32286
36 The SAS System 16:51 Thursday, October 13, 2022
32287 ;';";*/;
32288 proc contents data=datamart.'sinistres_auth'n ;run;
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
NOTE: The PROCEDURE CONTENTS printed page 3.
32289
32290 ;';";*/;
32291
37 The SAS System 16:51 Thursday, October 13, 2022
32292
32293 %put E3969440A681A2408885998500000017;
E3969440A681A2408885998500000017
32294
Then i executed this script as you want :
import pandas as pd
import saspy
import time
print('Chargement du dataframe...')
df = pd.read_pickle("c:\temp\sinis.pkl")
print('Ouverture de la session SAS sur le serveur...')
sas = saspy.SASsession(cfgfile='C:\Users\aurelien.thiery\PycharmProjects\sascfg_personal.py')
start = time.time()
cl = sas.df_char_lengths(df)
print('time= ', time.time()-start)
print(cl)
And the result :
Chargement du dataframe...
Ouverture de la session SAS sur le serveur...
Using SAS Config named: winiomIWA
SAS Connection established. Subprocess id is 17412
time= 70.60996747016907
{'SIN': 10, 'CR': 3, 'AN_RECOLTE': 4, 'EVTNAT': 5, 'TYPE_GES': 3, 'ETATSIN': 1, 'LIEU': 5, 'TOPMIG': 3, 'ANNEE_SURV': 4, 'ANNEE_OUV': 4, 'ANNEE_CLOS': 6, 'ANNEE_OBS': 4, 'PRODUIT': 2}
Aurélien
De : Tom Weber @.> Envoyé : jeudi 13 octobre 2022 16:08 À : sassoftware/saspy @.> Cc : THIERY Aurelien @.>; Author @.> Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)
I don't see any of the output for the dataframe info. Can you just run python interactively and submit all the code, then I can actually see what code was executed and get all of the output.
python
import pandas as pd
import saspy
print('Chargement du dataframe...')
df = pd.read_pickle("c:\temp\sinis.pkl")
print('NB lignes dataframe:', len(df.index))
print('Ouverture de la session SAS sur le serveur...')
sas = saspy.SASsession(cfgfile='C:\Users\aurelien.thiery\PycharmProjects\sascfg_personal.py')
print(sas)
print('Allocation de la bibliothèque...')
sas.saslib(libref='datamart', path='H:\DataProj\DFICMS\02_Datamarts\202204')
print('Conversion du dataframe en table SAS...')
sd = sas.df2sd(df[:1000], 'sinistres_auth', 'datamart')
print(sas.lastlog())
df.shape
df.info()
sas.df_char_lengths(df[:1000])
sd.columnInfo()
df.shape
sd.obs()
sd.contents(results='text')
print(sas.saslog())
Also, I'd be curious to know the following (on the full df):
import time
start = time.time()
cl = sas.df_char_lengths(df)
print('time= ', time.time()-start)
print(cl)
Thanks! Tom
— Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1277672844 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KAPJ24EWKYMVCYHHGTWDAJUJANCNFSM6AAAAAARCNQKGQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AD5A4KA7Q3K3HBWMDRWN4E3WDAJUJA5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSME66YY.gif Message ID: @. @.> >
That's great, thanks! It all shows up that way. Also, good news on my side; I was able to resize my linux development machine (it's virtual like everything else these days), which is good on all accounts. So, I was just able to load that pickle file and run the code here too! I'm in the middle of running df2sd() now. It's not the same as yours since I'm on linux and I'm actually using IOM Remote, but it's the same code, for the most part, and I'm seeing what I kinda expected. My SAS process is still only around 70M of memory, but that's probably because it's only set for the default memsize. I expect in your case that since you set memsize way higher, SAS is simply using it even though there's no requirement; it's streaming the data to disk, not storing it all in memory. But if you give SAS lots of mem, it tries to use it for things, so I think your 30g is 'just because it can', as there's no other reason it would be using that memory. Just a data step streaming the data to disk.
Your sas.df_char_lengths(df) only took 70 seconds, while on my side it took 215 (got the same results too, as it should!). So, that's good on your side!
>>> start = time.time()
>>> cl = sas.df_char_lengths(df)
>>> print('time= ', time.time()-start)
time= 215.42656183242798
>>> print(cl)
{'SIN': 10, 'CR': 3, 'AN_RECOLTE': 4, 'EVTNAT': 5, 'TYPE_GES': 3, 'ETATSIN': 1, 'LIEU': 7, 'TOPMIG': 3, 'ANNEE_SURV': 4, 'ANNEE_OUV': 4, 'ANNEE_CLOS': 6, 'ANNEE_OBS': 4, 'PRODUIT': 2}
>>>
Here's my status as I'm still in the middle of running this:
Well, so my df2sd() finished, at least it seemed so from both systems; python client machine and the SAS server machine. But, I never got control back in my python process; seemed saspy hung at the (presumably) end of the df2df step. So, maybe that's what you were seeing too. I'll have to investigate this and see what's happening. Hopefully, I just reproduced what you saw and when I figure out what happened, I'll be able to get it working for your case as well.
So, that's good, at least. I'll need to spend some time tracking this down. I'll post when after I've had a chance to track this. Doesn't help it takes a long time to run, but it seems to be something to do with transferring this volume of data.
Thanks! Tom
Ok, I believe I've reproduced and captured the failure. though it makes no sense as of yet. I need to do some more investigation to understand why. It's actually a problem on the SAS side and it accounts for both the excessive memory (which it runs out of) and the hang back in saspy, as it disconnects the client, but my client doesn't happen to catch that and fail. I need to see why, but probably waiting for output that never comes or something like that where I'm not getting a failure from an IOM client call. Either way, I believe it's the same problem you're running into. So, that's progress. More to come ...
Ok thanks for the news.AurélienLe 13 oct. 2022 à 22:08, Tom Weber @.***> a écrit : Ok, I believe I've reproduced and captured the failure. though it makes no sense as of yet. I need to do some more investigation to understand why. It's actually a problem on the SAS side and it accounts for both the excessive memory (which it runs out of) and the hang back in saspy, as it disconnects the client, but my client doesn't happen to catch that and fail. I need to see why, but probably waiting for output that never comes or something like that where I'm not getting a failure from an IOM client call. Either way, I believe it's the same problem you're running into. So, that's progress. More to come ...
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
Ok, sorry it's after 5 on Friday for you, but I have figured this out and fixed it at main. Had to run tests and validate everything and dealing with that huge dataframe made things take longer too, for the initial diagnosis. Hopefully you can pull from there and try it out:
pip uninstall -y saspy
pip install git+https://git@github.com/sassoftware/saspy.git
The failures on the SAS server yesterday had me wondering, because after 5.8 million rows SAS started throwing errors for every one of the next 12 million rows, complaining about
2022-10-13T14:52:25,796 ERROR [00000009] :sastpw - ERROR: Maximum level of nesting of macro functions exceeded.
And I have no macros in the datastep I generated and ran. So I was really confused by that. But, that showed why the memory was being consumed; till it ran out and disconnected the client. Since it's IOM, it's buffering the log to be returned via the api (I'm guessing that's why it ran out of memory; seems like). So, I still need to debug my java IOM client code to see why it never got the disconnect, but, I'll worry about that next week. When it works; that never happens :)
What I finally found is that in the LIEU column, which is mostly numeric (MOSTLY), yet it's an object (so string), there are a few rows where that column has the following string: ';;;;;' or ';;;;'. The first occurrence of that is in row 5847832! What was happening is that my data step for streaming the data over uses 'infile datalines ...' and 'datalines4;' sending the data straight into SAS over stdin (inline data). ';;;;' is the string that terminates the data input for a data step specifying datalines4. So, after shy of 6million rows, the datastep terminated (cleanly - the table was good, but just 5.8 million rows), and the other 12 million rows were treated as more SAS code being processed by SAS. Obviously, all of it was wrong and not SAS code so it generated a ton or error output to the log and didn't really make sense.
I've handled this issue in df2sd() and have pushed it to main. If you can pull that and validate on your end that this fixes it for your test case (don't want to just say it worked for me), that would be great.
Thanks! Tom
Hi Tom,
Thanks !
But i can’t access github site (enterprise proxy block it).
Aurélien
De : Tom Weber @.> Envoyé : vendredi 14 octobre 2022 21:46 À : sassoftware/saspy @.> Cc : THIERY Aurelien @.>; Author @.> Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)
Ok, sorry it's after 5 on Friday for you, but I have figured this out and fixed it at main. Had to run tests and validate everything and dealing with that huge dataframe made things take longer too, for the initial diagnosis. Hopefully you can pull from there and try it out:
pip uninstall -y saspy pip install @.***/sassoftware/saspy.git
The failures on the SAS server yesterday had me wondering, because after 5.8 million rows SAS started throwing errors for every one of the next 12 million rows, complaining about
2022-10-13T14:52:25,796 ERROR [00000009] :sastpw - ERROR: Maximum level of nesting of macro functions exceeded.
And I have no macros in the datastep I generated and ran. So I was really confused by that. But, that showed why the memory was being consumed; till it ran out and disconnected the client. Since it's IOM, it's buffering the log to be returned via the api (I'm guessing that's why it ran out of memory; seems like). So, I still need to debug my java IOM client code to see why it never got the disconnect, but, I'll worry about that next week. When it works; that never happens :)
What I finally found is that in the LIEU column, which is mostly numeric (MOSTLY), yet it's an object (so string), there are a few rows where that column has the following string: ';;;;;' or ';;;;'. The first occurrence of that is in row 5847832! What was happening is that my data step for streaming the data over uses 'infile datalines ...' and 'datalines4;' sending the data straight into SAS over stdin (inline data). ';;;;' is the string that terminates the data input for a data step specifying datalines4. So, after shy of 6million rows, the datastep terminated (cleanly - the table was good, but just 5.8 million rows), and the other 12 million rows were treated as more SAS code being processed by SAS. Obviously, all of it was wrong and not SAS code so it generated a ton or error output to the log and didn't really make sense.
I've handled this issue in df2sd() and have pushed it to main. If you can pull that and validate on your end that this fixes it for your test case (don't want to just say it worked for me), that would be great.
Thanks! Tom
— Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1279393723 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KBO4WLH5SBARWMV3Q3WDGZ7NANCNFSM6AAAAAARCNQKGQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AD5A4KAG5ATNEHRVIPJDO3TWDGZ7NA5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMIH73W.gif Message ID: < @.> @.>
@AurelienThiery I sent the zip of the 'main' branch for you to download; you can then pip install from it; same as if you had access to github. This will be in the next release I publish, but it would be nice to verify it worked on your end, even though I'm sure it will. If you have to have a production version (what you get with just: pip install) I can build that next release with this. Just let me know. Thanks, Tom
Hi Tom,Ok thanks. I’m in training today but i can test the package tonight (this afternoon for you).AurélienLe 17 oct. 2022 à 04:20, Tom Weber @.***> a écrit : @AurelienThiery I sent the zip of the 'main' branch for you to download; you can then pip install from it; same as if you had access to github. This will be in the next release I publish, but it would be nice to verify it worked on your end, even though I'm sure it will. If you have to have a production version (what you get with just: pip install) I can build that next release with this. Just let me know. Thanks, Tom
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
Tom,
I was able to test the new package and it works fine, the table contains all data (18 437 883 lines) :
And the SAS session closed itself. Thanks !!
Other subject : last friday i tested the reading of « big » SAS table (6 GB on the disk : 7 905 149 lines, 281 columns) on the linux server with docker :
import pandas as pd
import saspy
import time
print('Ouverture de la session SAS sur le serveur...')
sas = saspy.SASsession(cfgfile='/opt/sascfg_personal.py')
print('Allocation de la bibliothèque...')
sas.saslib(libref='mylib', path='H:\my_data')
print('Chargement de la table SAS dans un dataframe...')
start = time.time()
sd = sas.sasdata('mvtpra4', 'mylib')
df = sd.to_df()
print('Temps de chargement = ', time.time()-start)
print(df.head())
The step « df = sd.to_df() » (to convert sas data in pandas dataframe object) finished in error and all server RAM was full !! (64GB)
I will remake the test when you will publish the new package on pypi repository.
Aurélien
De : Tom Weber @.> Envoyé : lundi 17 octobre 2022 04:21 À : sassoftware/saspy @.> Cc : THIERY Aurelien @.>; Mention @.> Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)
@AurelienThiery https://github.com/AurelienThiery I sent the zip of the 'main' branch for you to download; you can then pip install from it; same as if you had access to github. This will be in the next release I publish, but it would be nice to verify it worked on your end, even though I'm sure it will. If you have to have a production version (what you get with just: pip install) I can build that next release with this. Just let me know. Thanks, Tom
— Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1280182998 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KH4K27BPZHIKEG7FYLWDSZXRANCNFSM6AAAAAARCNQKGQ . You are receiving this because you were mentioned. https://github.com/notifications/beacon/AD5A4KHFNV7U6Q4KKVQBORTWDSZXRA5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMJYFNM.gif Message ID: @. @.> >
Great to hear! I just published the new release; V4.3.5, that had this fix in it. So it's available now on Pypi; by tomorrow on Conda. I was also able to adjust the java IOM client code to catch this instead of hanging, so, although this won' t happen again, if some other thing like the sever terminating unexpectedly happens while streaming all this data, it should terminate and return to python with an exception. I'll close this one, but if you still have another problem, just open a new one for it! Thanks! Tom
Hi! We use saspy (version 4.3.3) to export a pandas dataframe to SAS dataset. But the dataset doesn't contain all data: 18 437 883 in dataframe and only 5 847 832 in dataset. I don't understand, we have no error. And when the dataset is created the RAM increase to reach 30GB! During this time the file isn't locked (i can rename it). And finaly the sas.exe process never ends (i have to kill it manually)... Attached some files. Thanks. Best regards, Aurélien
export.txt