sassoftware / saspy

A Python interface module to the SAS System. It works with Linux, Windows, and Mainframe SAS as well as with SAS in Viya.
https://sassoftware.github.io/saspy
Other
367 stars 149 forks source link

Convert dataframe to dataset #487

Closed AurelienThiery closed 1 year ago

AurelienThiery commented 1 year ago

Hi! We use saspy (version 4.3.3) to export a pandas dataframe to SAS dataset. But the dataset doesn't contain all data: 18 437 883 in dataframe and only 5 847 832 in dataset. I don't understand, we have no error. And when the dataset is created the RAM increase to reach 30GB! During this time the file isn't locked (i can rename it). And finaly the sas.exe process never ends (i have to kill it manually)... Attached some files. Thanks. Best regards, Aurélien

export.txt cmd desc_dataset task_manager_T1 task_manager_T2 task_manager_T3

tomweber-sas commented 1 year ago

I can help with this. I can't really tell that much though from what I see here. I need more info on the dataframe and the sas dataset. What's to be seen in the CMD window seems to show df2sd() ran, and it seems to have terminated the SAS session without issue. But, I can't really tell without more info. The df has some index on it that won't be - can't be - included in the dataset, so depending upon what that is, it could account for a discrepancy; but I don't see anything about either the df or dataset to guess. Can you run the following and provide me the output (at the end of your current program after running df2sd)? You may need to wrap some of this w/ print() if you're running it in batch, but not if you run it interactively.

# change the line you have for df2sd to assign it to sd so we can reference the data set next:
# [...] all your program here but change the last line like this:
sd = sas.df2sd(df, 'sinistres_auth', 'datamart')

# after that run this
print(sas.lastlog())
df.dtypes
sas.df_char_lengths(df)
sd = sas.sasdata('sinistres_auth', 'datamart')
sd.columnInfo()
df.shape
sd.obs()
sd.contents(results='text')

With this I can at least see what the df is like and what the data set has in it.

Thanks, Tom

AurelienThiery commented 1 year ago

Hi Tom, The problem is the execution is blocked here (the next instruction are not executed) :

@.***

I can send you the pickle file if you want make test. The file is very large (> 4GB), i can send you by Crédit Agricole ftp website (named Artefis). If you are ok can you send me your email please ?

Best regards, Aurélien

De : Tom Weber @.> Envoyé : mardi 11 octobre 2022 18:18 À : sassoftware/saspy @.> Cc : THIERY Aurelien @.>; Author @.> Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)

I can help with this. I can't really tell that much though from what I see here. I need more info on the dataframe and the sas dataset. What's to be seen in the CMD window seems to show df2sd() ran, and it seems to have terminated the SAS session without issue. But, I can't really tell without more info. The df has some index on it that won't be - can't be - included in the dataset, so depending upon what that is, it could account for a discrepancy; but I don't see anything about either the df or dataset to guess. Can you run the following and provide me the output (at the end of your current program after running df2sd)? You may need to wrap some of this w/ print() if you're running it in batch, but not if you run it interactively.

change the line you have for df2sd to assign it to sd so we can reference the data set next:

[...] all your program here but change the last line like this:

sd = sas.df2sd(df, 'sinistres_auth', 'datamart')

after that run this

print(sas.lastlog())

df.dtypes

sas.df_char_lengths(df)

sd = sas.sasdata('sinistres_auth', 'datamart')

sd.columnInfo()

df.shape

sd.obs()

sd.contents(results='text')

With this I can at least see what the df is like and what the data set has in it.

Thanks, Tom

— Reply to this email directly, view it on GitHubhttps://github.com/sassoftware/saspy/issues/487#issuecomment-1274951211, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD5A4KDQAAGUBPUV3WFDMZDWCWHKFANCNFSM6AAAAAARCNQKGQ. You are receiving this because you authored the thread.Message ID: @.**@.>>

tomweber-sas commented 1 year ago

Sure, I can try that out. My email is on my account here, but here it is too: tom.weber@sas.com.

AurelienThiery commented 1 year ago

Ok perfect!I send you the file tomorrow.You are working in Cary?AurélienLe 12 oct. 2022 à 17:27, Tom Weber @.> a écrit : Sure, I can try that out. My email is on my account here, but here it is too: @.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

tomweber-sas commented 1 year ago

Yes, I'm in Cary. I'm not sure where you said it stopped. If it's on the df2sd, something you can do to just get the info is to change it to just use the first n rows; df[:1000] So if you just play with that, it would need to be for these 2 methods:

sd = sas.df2sd(df[:1000], 'sinistres_auth', 'datamart') sas.df_char_lengths(df[:1000])

The other references using 'df' should be ok.

Also, just playing with this a little yesterday, I did df = sas.sd2df('cars','sashelp') followed by df = df.append(df) till I had 7 million rows or so, and then did sd = sas.df2sd(df) and I never saw the memory go above maybe 70k; I looked at taskmanager, like you showed as well as using process explorer; they both showed equivalent cpu and memory, and I/O for the process. So, I'm not sure where your 30G of memory could be coming from either. That doesn't make any sense to me yet.

But, we'll get to the bottom of it.

One question is about the drives you're using. SAS is on C:, while WORK is on W: and the libref points to H:. Are these actual drives (or partitions) on your actual PC? Or are they remote mounted volumes? Just trying to come up with some guesses as to what you're seeing w/ that memory consumption. Not that I have a correlation with drives; it's just something that jumped out at me looking at the output.

Thanks, Tom

AurelienThiery commented 1 year ago

Ok i will send you the file tonight (for me).I answer your questions tomorrow.AurélienLe 12 oct. 2022 à 17:50, Tom Weber @.***> a écrit : Yes, I'm in Cary. I'm not sure where you said it stopped. If it's on the df2sd, something you can do to just get the info is to change it to just use the first n rows; df[:1000] So if you just play with that, it would need to be for these 2 methods: sd = sas.df2sd(df[:1000], 'sinistres_auth', 'datamart') sas.df_char_lengths(df[:1000]) The other references using 'df' should be ok. Also, just playing with this a little yesterday, I did df = sas.sd2df('cars','sashelp') followed by df = df.append(df) till I had 7 million rows or so, and then did sd = sas.df2sd(df) and I never saw the memory go above maybe 70k; I looked at taskmanager, like you showed as well as using process explorer; they both showed equivalent cpu and memory, and I/O for the process. So, I'm not sure where your 30G of memory could be coming from either. That doesn't make any sense to me yet. But, we'll get to the bottom of it. One question is about the drives you're using. SAS is on C:, while WORK is on W: and the libref points to H:. Are these actual drives (or partitions) on your actual PC? Or are they remote mounted volumes? Just trying to come up with some guesses as to what you're seeing w/ that memory consumption. Not that I have a correlation with drives; it's just something that jumped out at me looking at the output. Thanks, Tom

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

AurelienThiery commented 1 year ago

Tom,

For the tests I’m on my windows PC and processing (python script) connects to the SAS server (all PCs and servers are in the same domain without firewall). But the target is to execute processing on a linux server with docker and we have the same result.

The SAS Server is on Windows 2019 (4 CPU and 64GB RAM in development environment, 16 CPU and 128 GB RAM in production) and SAS version is 9.4 M7. For the tests i changed the sasv9.cfg to increase RAM limit to 32GB.

On the SAS server all the disks are remote mounted volumes (netapp full flash)

To finish i specify we have no problem with small dataframe.

I send you the pickle file with our internal solution : Artefis.

Aurélien

PS : sorry for my english L

De : Tom Weber @.> Envoyé : mercredi 12 octobre 2022 17:50 À : sassoftware/saspy @.> Cc : THIERY Aurelien @.>; Author @.> Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)

Yes, I'm in Cary. I'm not sure where you said it stopped. If it's on the df2sd, something you can do to just get the info is to change it to just use the first n rows; df[:1000] So if you just play with that, it would need to be for these 2 methods:

sd = sas.df2sd(df[:1000], 'sinistres_auth', 'datamart') sas.df_char_lengths(df[:1000])

The other references using 'df' should be ok.

Also, just playing with this a little yesterday, I did df = sas.sd2df('cars','sashelp') followed by df = df.append(df) till I had 7 million rows or so, and then did sd = sas.df2sd(df) and I never saw the memory go above maybe 70k; I looked at taskmanager, like you showed as well as using process explorer; they both showed equivalent cpu and memory, and I/O for the process. So, I'm not sure where your 30G of memory could be coming from either. That doesn't make any sense to me yet.

But, we'll get to the bottom of it.

One question is about the drives you're using. SAS is on C:, while WORK is on W: and the libref points to H:. Are these actual drives (or partitions) on your actual PC? Or are they remote mounted volumes? Just trying to come up with some guesses as to what you're seeing w/ that memory consumption. Not that I have a correlation with drives; it's just something that jumped out at me looking at the output.

Thanks, Tom

— Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1276395310 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KDW4LSP23FSLRYRJCDWC3MZBANCNFSM6AAAAAARCNQKGQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AD5A4KB4HZ2A5RO66WIKHXLWC3MZBA5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMCQ7S4.gif Message ID: @. @.> >

tomweber-sas commented 1 year ago

Ok, I downloaded the file and extracted it. Python on my pc doesn't support the format. I have python3.6 and the file is a newer format than I can read with the pickle support it has. And I haven' been able to upgrade anything in that deployment to get the needed support. On my linux development environment that has a newer Python, I can't load the entire file as it runs out of memory. I think trying to run the code w/ just df[:1000] will get me the info I'm looking for. Maybe write out a pickle file w/ that amount of data and upload it, then I can load that in my dev env and be able to do some more diagnosis. Tom

AurelienThiery commented 1 year ago

I have python 3.10.7 on my PC with 16GB RAM. I can load the pickle file in dataframe when i close all applications.Le 12 oct. 2022 à 21:43, Tom Weber @.***> a écrit : Ok, I downloaded the file and extracted it. Python on my pc doesn't support the format. I have python3.6 and the file is a newer format than I can read with the pickle support it has. And I haven' been able to upgrade anything in that deployment to get the needed support. On my linux development environment that has a newer Python, I can't load the entire file as it runs out of memory. I think trying to run the code w/ just df[:1000] will get me the info I'm looking for. Maybe write out a pickle file w/ that amount of data and upload it, then I can load that in my dev env and be able to do some more diagnosis. Tom

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

AurelienThiery commented 1 year ago

Hi Tom,

I executed this script (no problem), the file log is in this mail.

Best regards,

Aurélien

De : Tom Weber @.> Envoyé : mercredi 12 octobre 2022 21:43 À : sassoftware/saspy @.> Cc : THIERY Aurelien @.>; Author @.> Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)

Ok, I downloaded the file and extracted it. Python on my pc doesn't support the format. I have python3.6 and the file is a newer format than I can read with the pickle support it has. And I haven' been able to upgrade anything in that deployment to get the needed support. On my linux development environment that has a newer Python, I can't load the entire file as it runs out of memory. I think trying to run the code w/ just df[:1000] will get me the info I'm looking for. Maybe write out a pickle file w/ that amount of data and upload it, then I can load that in my dev env and be able to do some more diagnosis. Tom

— Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1276654410 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KGDDUL56I3TVTLW6U3WC4ID5ANCNFSM6AAAAAARCNQKGQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AD5A4KDXBV2IVDKPTWCNDGTWC4ID5A5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMDAZUU.gif Message ID: @. @.> >

AurelienThiery commented 1 year ago

With the log ;-)

De : THIERY Aurelien Envoyé : jeudi 13 octobre 2022 10:10 À : 'sassoftware/saspy' @.>; sassoftware/saspy @.> Cc : Author @.***> Objet : RE: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)

Hi Tom,

I executed this script (no problem), the file log is in this mail.

Best regards,

Aurélien

De : Tom Weber @. @.> > Envoyé : mercredi 12 octobre 2022 21:43 À : sassoftware/saspy @. @.> > Cc : THIERY Aurelien @. @.> >; Author @. @.> > Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)

Ok, I downloaded the file and extracted it. Python on my pc doesn't support the format. I have python3.6 and the file is a newer format than I can read with the pickle support it has. And I haven' been able to upgrade anything in that deployment to get the needed support. On my linux development environment that has a newer Python, I can't load the entire file as it runs out of memory. I think trying to run the code w/ just df[:1000] will get me the info I'm looking for. Maybe write out a pickle file w/ that amount of data and upload it, then I can load that in my dev env and be able to do some more diagnosis. Tom

— Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1276654410 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KGDDUL56I3TVTLW6U3WC4ID5ANCNFSM6AAAAAARCNQKGQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AD5A4KDXBV2IVDKPTWCNDGTWC4ID5A5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMDAZUU.gif Message ID: @. @.> >

tomweber-sas commented 1 year ago

Hey, assuming the log was an attachment. It didn't come through. Maybe a direct email to me? I don't see anything saying an attachment was stripped out, or anything like that.

From: AurelienThiery @.> Sent: Thursday, October 13, 2022 5:30 AM To: sassoftware/saspy @.> Cc: Tom Weber @.>; Comment @.> Subject: Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)

EXTERNAL With the log ;-)

De : THIERY Aurelien Envoyé : jeudi 13 octobre 2022 10:10 À : 'sassoftware/saspy' @.<mailto:@.>>; sassoftware/saspy @.<mailto:@.>> Cc : Author @.<mailto:@.>> Objet : RE: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)

Hi Tom,

I executed this script (no problem), the file log is in this mail.

Best regards,

Aurélien

De : Tom Weber @.<mailto:@.> @.<mailto:@.>> > Envoyé : mercredi 12 octobre 2022 21:43 À : sassoftware/saspy @.<mailto:@.> @.<mailto:@.>> > Cc : THIERY Aurelien @.<mailto:@.> @.<mailto:@.>> >; Author @.<mailto:@.> @.<mailto:@.>> > Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)

Ok, I downloaded the file and extracted it. Python on my pc doesn't support the format. I have python3.6 and the file is a newer format than I can read with the pickle support it has. And I haven' been able to upgrade anything in that deployment to get the needed support. On my linux development environment that has a newer Python, I can't load the entire file as it runs out of memory. I think trying to run the code w/ just df[:1000] will get me the info I'm looking for. Maybe write out a pickle file w/ that amount of data and upload it, then I can load that in my dev env and be able to do some more diagnosis. Tom

- Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1276654410 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KGDDUL56I3TVTLW6U3WC4ID5ANCNFSM6AAAAAARCNQKGQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AD5A4KDXBV2IVDKPTWCNDGTWC4ID5A5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMDAZUU.gif Message ID: @.<mailto:@.> @.<mailto:@.>> >

- Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsassoftware%2Fsaspy%2Fissues%2F487%23issuecomment-1277311147&data=05%7C01%7Ctom.weber%40sas.com%7C0a635ce5db604db9f84308daacfd8283%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C638012502084426200%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LvjCdLwKMg%2Fw58QHd67t36DsjRZx6fiqxkcRQFWZvHg%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEHDYZWGEEFNK2VUQFVYXB3WC7JBXANCNFSM6AAAAAARCNQKGQ&data=05%7C01%7Ctom.weber%40sas.com%7C0a635ce5db604db9f84308daacfd8283%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C638012502084582433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=eTXQGHxwihP%2BKhxAHH1vKX2cK78R5FYfUx7m5LcJgTw%3D&reserved=0. You are receiving this because you commented.Message ID: @.**@.>>

AurelienThiery commented 1 year ago

Ok i send you the log by ftp website.

De : Tom Weber @.> Envoyé : jeudi 13 octobre 2022 15:16 À : sassoftware/saspy @.> Cc : THIERY Aurelien @.>; Author @.> Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)

Hey, assuming the log was an attachment. It didn't come through. Maybe a direct email to me? I don't see anything saying an attachment was stripped out, or anything like that.

From: AurelienThiery @. <mailto:@.> > Sent: Thursday, October 13, 2022 5:30 AM To: sassoftware/saspy @. <mailto:@.> > Cc: Tom Weber @. <mailto:@.> >; Comment @. <mailto:@.> > Subject: Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)

EXTERNAL With the log ;-)

De : THIERY Aurelien Envoyé : jeudi 13 octobre 2022 10:10 À : 'sassoftware/saspy' @. <mailto:@.%3cmailto:@.> <mailto:@.>>; sassoftware/saspy @. <mailto:@.%3cmailto:@.> <mailto:@.>> Cc : Author @. <mailto:@.%3cmailto:@.> <mailto:@.>> Objet : RE: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)

Hi Tom,

I executed this script (no problem), the file log is in this mail.

Best regards,

Aurélien

De : Tom Weber @. <mailto:@.%3cmailto:@.> <mailto:@.> @. <mailto:@.%3cmailto:@.> <mailto:@.>> > Envoyé : mercredi 12 octobre 2022 21:43 À : sassoftware/saspy @. <mailto:@.%3cmailto:@.> <mailto:@.> @. <mailto:@.%3cmailto:@.> <mailto:@.>> > Cc : THIERY Aurelien @. <mailto:@.%3cmailto:@.> <mailto:@.> @. <mailto:@.%3cmailto:@.> <mailto:@.>> >; Author @. <mailto:@.%3cmailto:@.> <mailto:@.> @. <mailto:@.%3cmailto:@.> <mailto:@.>> > Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)

Ok, I downloaded the file and extracted it. Python on my pc doesn't support the format. I have python3.6 and the file is a newer format than I can read with the pickle support it has. And I haven' been able to upgrade anything in that deployment to get the needed support. On my linux development environment that has a newer Python, I can't load the entire file as it runs out of memory. I think trying to run the code w/ just df[:1000] will get me the info I'm looking for. Maybe write out a pickle file w/ that amount of data and upload it, then I can load that in my dev env and be able to do some more diagnosis. Tom

- Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1276654410 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KGDDUL56I3TVTLW6U3WC4ID5ANCNFSM6AAAAAARCNQKGQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AD5A4KDXBV2IVDKPTWCNDGTWC4ID5A5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMDAZUU.gif Message ID: @. <mailto:@.%3cmailto:@.> <mailto:@.> @. <mailto:@.%3cmailto:@.> <mailto:@.>> >

- Reply to this email directly, view it on GitHub<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsassoftware%2Fsaspy%2Fissues%2F487%23issuecomment-1277311147 https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsassoftware%2Fsaspy%2Fissues%2F487%23issuecomment-1277311147&data=05%7C01%7Ctom.weber%40sas.com%7C0a635ce5db604db9f84308daacfd8283%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C638012502084426200%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LvjCdLwKMg%2Fw58QHd67t36DsjRZx6fiqxkcRQFWZvHg%3D&reserved=0 &data=05%7C01%7Ctom.weber%40sas.com%7C0a635ce5db604db9f84308daacfd8283%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C638012502084426200%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LvjCdLwKMg%2Fw58QHd67t36DsjRZx6fiqxkcRQFWZvHg%3D&reserved=0>, or unsubscribe<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEHDYZWGEEFNK2VUQFVYXB3WC7JBXANCNFSM6AAAAAARCNQKGQ https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEHDYZWGEEFNK2VUQFVYXB3WC7JBXANCNFSM6AAAAAARCNQKGQ&data=05%7C01%7Ctom.weber%40sas.com%7C0a635ce5db604db9f84308daacfd8283%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C638012502084582433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=eTXQGHxwihP%2BKhxAHH1vKX2cK78R5FYfUx7m5LcJgTw%3D&reserved=0 &data=05%7C01%7Ctom.weber%40sas.com%7C0a635ce5db604db9f84308daacfd8283%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C638012502084582433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=eTXQGHxwihP%2BKhxAHH1vKX2cK78R5FYfUx7m5LcJgTw%3D&reserved=0>. You are receiving this because you commented.Message ID: @.**@. mailto:***@***.******@***.*** >>

— Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1277592225 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KFDQO7PW33C3YE5Z6TWDADP5ANCNFSM6AAAAAARCNQKGQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AD5A4KBVDKG22CWTPZY3LYDWDADP5A5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSME2BKC.gif Message ID: @. @.> >

tomweber-sas commented 1 year ago

cool, thanks! got it. Now let me analyze it and see what I see!

tomweber-sas commented 1 year ago

I don't see any of the output for the dataframe info. Can you just run python interactively and submit all the code, then I can actually see what code was executed and get all of the output.

>>> python

import pandas as pd
import saspy
print('Chargement du dataframe...')
df = pd.read_pickle("c:\\temp\\sinis.pkl")
print('NB lignes dataframe:', len(df.index))
print('Ouverture de la session SAS sur le serveur...')
sas = saspy.SASsession(cfgfile='C:\\Users\\aurelien.thiery\\PycharmProjects\\sascfg_personal.py')
print(sas)
print('Allocation de la bibliothèque...')
sas.saslib(libref='datamart', path='H:\\DataProj\DFICMS\\02_Datamarts\\202204')
print('Conversion du dataframe en table SAS...')

sd = sas.df2sd(df[:1000], 'sinistres_auth', 'datamart')

print(sas.lastlog())
df.shape
df.info()
sas.df_char_lengths(df[:1000])
sd.columnInfo()
df.shape
sd.obs()
sd.contents(results='text')

print(sas.saslog())

Also, I'd be curious to know the following (on the full df):

import time
start = time.time()
cl = sas.df_char_lengths(df)
print('time= ', time.time()-start)
print(cl)

Thanks! Tom

AurelienThiery commented 1 year ago

Tom,

The instructions in green :

C:\Users\aurelien.thiery\PycharmProjects\pythonProject1\venv\Scripts\python.exe "C:/Program Files/JetBrains/PyCharm Community Edition 2022.2.2/plugins/python-ce/helpers/pydev/pydevconsole.py" --mode=client --host=127.0.0.1 --port=63795

import sys; print('Python %s on %s' % (sys.version, sys.platform))

sys.path.extend(['C:\Users\aurelien.thiery\PycharmProjects\pythonProject1'])

Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]

Type 'copyright', 'credits' or 'license' for more information

IPython 8.5.0 -- An enhanced Interactive Python. Type '?' for help.

PyDev console: using IPython 8.5.0

Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)] on win32

import pandas as pd

import saspy

print('Chargement du dataframe...')

Chargement du dataframe...

df = pd.read_pickle("c:\temp\sinis.pkl")

print('NB lignes dataframe:', len(df.index))

NB lignes dataframe: 18437883

print('Ouverture de la session SAS sur le serveur...')

Ouverture de la session SAS sur le serveur...

sas = saspy.SASsession(cfgfile='C:\Users\aurelien.thiery\PycharmProjects\sascfg_personal.py')

Using SAS Config named: winiomIWA

SAS Connection established. Subprocess id is 20904

print(sas)

Access Method = IOM

SAS Config name = winiomIWA

SAS Config file = C:\Users\aurelien.thiery\PycharmProjects\sascfg_personal.py

WORK Path = W:\WorkSAS_TD23084S00SAS017\Prc2\

SAS Version = 9.04.01M7P08052020

SASPy Version = 4.3.3

Teach me SAS = False

Batch = False

Results = Pandas

SAS Session Encoding = wlatin1

Python Encoding value = cp1252

SAS process Pid value = 23084

print('Allocation de la bibliothèque...')

Allocation de la bibliothèque...

sas.saslib(libref='datamart', path='H:\DataProj\DFICMS\02_Datamarts\202204')

5 The SAS System 16:51 Thursday, October 13, 2022

24

25 libname datamart 'H:\DataProj\DFICMS\02_Datamarts\202204' ;

NOTE: Libref DATAMART was successfully assigned as follows:

  Engine:        V9 

  Physical Name: H:\DataProj\DFICMS\02_Datamarts\202204

26

27

28

6 The SAS System 16:51 Thursday, October 13, 2022

29

print('Conversion du dataframe en table SAS...')

Conversion du dataframe en table SAS...

sd = sas.df2sd(df[:1000], 'sinistres_auth', 'datamart')

C:\Users\aurelien.thiery\PycharmProjects\pythonProject1\venv\lib\site-packages\saspy\sasioiom.py:1532: UserWarning: Note that Indexes are not transferred over as columns. Only actual columns are transferred

warnings.warn("Note that Indexes are not transferred over as columns. Only actual columns are transferred")

print(sas.lastlog())

7 The SAS System 16:51 Thursday, October 13, 2022

32 ;';";*/;

33

34 data null; retain libref; retain cobs 1;

35 set sashelp.vlibnam end=last;

36 if cobs EQ 1 then

37 put "LIBREFSSTART=";

38 cobs = 2;

39 if libref NE libname then

40 put %upcase("lib=") libname %upcase('libEND=');

41 libref = libname;

42 if last then

43 put "LIBREFSEND=";

44 run;

LIBREFSSTART=

LIB=WORK LIBEND=

LIB=DATAMART LIBEND=

LIB=SASDATA LIBEND=

LIB=O21_0822 LIBEND=

LIB=O22_0822 LIBEND=

LIB=REL0822 LIBEND=

LIB=O12_0122 LIBEND=

LIB=O13_0122 LIBEND=

LIB=O14_0122 LIBEND=

LIB=O15_0122 LIBEND=

LIB=O16_0122 LIBEND=

LIB=O17_0122 LIBEND=

LIB=O18_0122 LIBEND=

LIB=O19_0122 LIBEND=

LIB=O20_0122 LIBEND=

LIB=O21_0122 LIBEND=

LIB=DGT0622 LIBEND=

LIB=IFRHIS4 LIBEND=

LIB=IFRHIS LIBEND=

LIB=IFRACT LIBEND=

LIB=O11_0121 LIBEND=

LIB=O12_0121 LIBEND=

LIB=O13_0121 LIBEND=

LIB=O14_0121 LIBEND=

LIB=O15_0121 LIBEND=

LIB=O16_0121 LIBEND=

LIB=O17_0121 LIBEND=

LIB=O18_0121 LIBEND=

LIB=O19_0121 LIBEND=

LIB=TEST LIBEND=

LIB=OSA2020 LIBEND=

LIB=OSA2021 LIBEND=

LIB=NBO2022 LIBEND=

LIB=GEO1221 LIBEND=

LIB=O20_0821 LIBEND=

LIB=O21_0821 LIBEND=

LIB=OSADTM LIBEND=

LIB=PCRM2022 LIBEND=

LIB=O20_0121 LIBEND=

LIB=PCRH2021 LIBEND=

LIB=PCRH2022 LIBEND=

LIB=PCRM2021 LIBEND=

LIB=SIMWRK LIBEND=

LIB=SM9WRK LIBEND=

LIB=SVUWRK LIBEND=

LIB=SSTWRK LIBEND=

LIB=SNHWRK LIBEND=

LIB=SA4WRK LIBEND=

LIB=SA2WRK LIBEND=

LIB=SAP0821 LIBEND=

LIB=SAP0721 LIBEND=

LIB=STPSAMP LIBEND=

LIB=SASHELP LIBEND=

LIB=MAPS LIBEND=

LIB=MAPSSAS LIBEND=

LIB=MAPSGFK LIBEND=

LIB=SASUSER LIBEND=

LIBREFSEND=

NOTE: There were 952 observations read from the data set SASHELP.VLIBNAM.

NOTE: DATA statement used (Total process time):

  real time           0.13 seconds

  cpu time            0.03 seconds

45

46

47 ;';";*/;

48

8 The SAS System 16:51 Thursday, October 13, 2022

49

50 %put E3969440A681A2408885998500000004;

E3969440A681A2408885998500000004

51

9 The SAS System 16:51 Thursday, October 13, 2022

52 data datamart.'sinistres_auth'n;

53 length 'SIN'n $10 'DBT_VALIDITE'n 8 'FIN_VALIDITE'n 8 'CR'n $3 'DATE_OUV'n 8 'DATE_SURV'n 8 'DATE_CLOS'n 8 'AN_RECOLTE'n

53 ! $3 'EVTNAT'n $5 'TYPE_GES'n $3 'ETATSIN'n $1 'LIEU'n $5 'PJ'n 8 'TOPMIG'n $3 'DATEOBS'n 8 'EVAL_BRUTE'n 8

53 ! 'EVAL_RECOURS'n 8 'EVAL_NET'n 8 'REG_PRINC'n 8 'REG_FRAIS'n 8 'REG_HONOR'n 8 'REG_RECOURS'n 8 'REG_NET'n 8 'RES_NET'n 8

53 ! 'GESTE_MONT'n 8 'ANNEE_SURV'n $4 'ANNEE_OUV'n $4 'ANNEE_CLOS'n $4 'ANNEE_OBS'n $4 'PRODUIT'n $2 'EVAL_REM'n 8 'CHARGE'n 8

53 ! ;

54 format 'DBT_VALIDITE'n E8601DT26.6 'FIN_VALIDITE'n E8601DT26.6 'DATE_OUV'n E8601DT26.6 'DATE_SURV'n E8601DT26.6

54 ! 'DATE_CLOS'n E8601DT26.6 'DATEOBS'n E8601DT26.6 ;

55 infile datalines delimiter='03'x STOPOVER;

56 input @;

57 if infile = '' then delete;

58 else do;

59 input 'SIN'n ;

60 input 'DBT_VALIDITE'n :B8601DT26.6 ;

61 input 'FIN_VALIDITE'n :B8601DT26.6 ;

62 input 'CR'n ;

63 input 'DATE_OUV'n :B8601DT26.6 ;

64 input 'DATE_SURV'n :B8601DT26.6 ;

65 input 'DATE_CLOS'n :B8601DT26.6 ;

66 input 'AN_RECOLTE'n ;

67 input 'EVTNAT'n ;

68 input 'TYPE_GES'n ;

69 input 'ETATSIN'n ;

70 input 'LIEU'n ;

71 input 'PJ'n ;

72 input 'TOPMIG'n ;

73 input 'DATEOBS'n :B8601DT26.6 ;

74 input 'EVAL_BRUTE'n ;

75 input 'EVAL_RECOURS'n ;

76 input 'EVAL_NET'n ;

77 input 'REG_PRINC'n ;

78 input 'REG_FRAIS'n ;

79 input 'REG_HONOR'n ;

80 input 'REG_RECOURS'n ;

81 input 'REG_NET'n ;

82 input 'RES_NET'n ;

83 input 'GESTE_MONT'n ;

84 input 'ANNEE_SURV'n ;

85 input 'ANNEE_OUV'n ;

86 input 'ANNEE_CLOS'n ;

87 input 'ANNEE_OBS'n ;

88 input 'PRODUIT'n ;

89 input 'EVAL_REM'n ;

90 input 'CHARGE'n ;

91 'SIN'n = translate('SIN'n, '0a'X, '01'x);

92 'SIN'n = translate('SIN'n, '0D'x, '02'x );

93 'CR'n = translate('CR'n, '0a'X, '01'x);

94 'CR'n = translate('CR'n, '0D'x, '02'x );

95 'AN_RECOLTE'n = translate('AN_RECOLTE'n, '0a'X, '01'x);

96 'AN_RECOLTE'n = translate('AN_RECOLTE'n, '0D'x, '02'x );

97 'EVTNAT'n = translate('EVTNAT'n, '0a'X, '01'x);

98 'EVTNAT'n = translate('EVTNAT'n, '0D'x, '02'x );

99 'TYPE_GES'n = translate('TYPE_GES'n, '0a'X, '01'x);

100 'TYPE_GES'n = translate('TYPE_GES'n, '0D'x, '02'x );

101 'ETATSIN'n = translate('ETATSIN'n, '0a'X, '01'x);

102 'ETATSIN'n = translate('ETATSIN'n, '0D'x, '02'x );

103 'LIEU'n = translate('LIEU'n, '0a'X, '01'x);

104 'LIEU'n = translate('LIEU'n, '0D'x, '02'x );

105 'TOPMIG'n = translate('TOPMIG'n, '0a'X, '01'x);

106 'TOPMIG'n = translate('TOPMIG'n, '0D'x, '02'x );

107 'ANNEE_SURV'n = translate('ANNEE_SURV'n, '0a'X, '01'x);

108 'ANNEE_SURV'n = translate('ANNEE_SURV'n, '0D'x, '02'x );

109 'ANNEE_OUV'n = translate('ANNEE_OUV'n, '0a'X, '01'x);

110 'ANNEE_OUV'n = translate('ANNEE_OUV'n, '0D'x, '02'x );

111 'ANNEE_CLOS'n = translate('ANNEE_CLOS'n, '0a'X, '01'x);

112 'ANNEE_CLOS'n = translate('ANNEE_CLOS'n, '0D'x, '02'x );

113 'ANNEE_OBS'n = translate('ANNEE_OBS'n, '0a'X, '01'x);

114 'ANNEE_OBS'n = translate('ANNEE_OBS'n, '0D'x, '02'x );

115 'PRODUIT'n = translate('PRODUIT'n, '0a'X, '01'x);

116 'PRODUIT'n = translate('PRODUIT'n, '0D'x, '02'x );

117 ;

118 end;

119 datalines4;

10 The SAS System 16:51 Thursday, October 13, 2022

NOTE: DATA statement used (Total process time):

  real time           0.16 seconds

  cpu time            0.14 seconds

NOTE: The data set DATAMART.SINISTRES_AUTH has 1000 observations and 32 variables.

NOTE: Compressing data set DATAMART.SINISTRES_AUTH decreased size by 0.00 percent.

  Compressed is 4 pages; un-compressed would require 4 pages.

32139 ;;;;

32140 ;;;;

32141

11 The SAS System 16:51 Thursday, October 13, 2022

32142 ;';";*/;

32143 quit;

32144

32145 ;';";*/;

32146

12 The SAS System 16:51 Thursday, October 13, 2022

32147

32148 %put E3969440A681A2408885998500000005;

E3969440A681A2408885998500000005

32149

13 The SAS System 16:51 Thursday, October 13, 2022

32150 ;';";*/;

32151 data null; e = exist("datamart.'sinistres_auth'n");

32152 v = exist("datamart.'sinistres_auth'n", 'VIEW');

32153 if e or v then e = 1;

32154 te='TABLE_EXISTS='; put te e;run;

TABLE_EXISTS= 1

NOTE: DATA statement used (Total process time):

  real time           0.00 seconds

  cpu time            0.00 seconds

32155

32156

32157 ;';";*/;

32158

14 The SAS System 16:51 Thursday, October 13, 2022

32159

32160 %put E3969440A681A2408885998500000006;

E3969440A681A2408885998500000006

32161

15 The SAS System 16:51 Thursday, October 13, 2022

32162 ;';";*/;

32163 %let engine=BAD;

32164 proc sql;

32164 ! select distinct engine into :engine from sashelp.VLIBNAM where libname = 'DATAMART';

32164 ! ;%put engstart=&engine

32164 ! engend=;

engstart=V9 engend=

32165 quit;

NOTE: The PROCEDURE SQL printed page 1.

NOTE: PROCEDURE SQL used (Total process time):

  real time           0.01 seconds

  cpu time            0.03 seconds

32166

32167 ;';";*/;

32168

16 The SAS System 16:51 Thursday, October 13, 2022

32169

32170 %put E3969440A681A2408885998500000007;

E3969440A681A2408885998500000007

32171

df.shape

Out[15]: (18437883, 32)

df.info()

<class 'pandas.core.frame.DataFrame'>

Int64Index: 18437883 entries, 0 to 380710

Data columns (total 32 columns):

Column Dtype


0 SIN object

1 DBT_VALIDITE datetime64[ns]

2 FIN_VALIDITE datetime64[ns]

3 CR object

4 DATE_OUV datetime64[ns]

5 DATE_SURV datetime64[ns]

6 DATE_CLOS datetime64[ns]

7 AN_RECOLTE object

8 EVTNAT object

9 TYPE_GES object

10 ETATSIN object

11 LIEU object

12 PJ float64

13 TOPMIG object

14 DATEOBS datetime64[ns]

15 EVAL_BRUTE float64

16 EVAL_RECOURS float64

17 EVAL_NET float64

18 REG_PRINC float64

19 REG_FRAIS float64

20 REG_HONOR float64

21 REG_RECOURS float64

22 REG_NET float64

23 RES_NET float64

24 GESTE_MONT float64

25 ANNEE_SURV object

26 ANNEE_OUV object

27 ANNEE_CLOS object

28 ANNEE_OBS object

29 PRODUIT object

30 EVAL_REM float64

31 CHARGE float64

dtypes: datetime64ns, float64(13), object(13)

memory usage: 4.5+ GB

sas.df_char_lengths(df[:1000])

Out[17]:

{'SIN': 10,

'CR': 3,

'AN_RECOLTE': 3,

'EVTNAT': 5,

'TYPE_GES': 3,

'ETATSIN': 1,

'LIEU': 5,

'TOPMIG': 3,

'ANNEE_SURV': 4,

'ANNEE_OUV': 4,

'ANNEE_CLOS': 4,

'ANNEE_OBS': 4,

'PRODUIT': 2}

sd.columnInfo()

Out[18]:

                 Member   Num      Variable  Type   Len    Pos       Format

0 DATAMART.SINISTRES_AUTH 28.0 ANNEE_CLOS Char 4.0 193.0 NaN

1 DATAMART.SINISTRES_AUTH 29.0 ANNEE_OBS Char 4.0 197.0 NaN

2 DATAMART.SINISTRES_AUTH 27.0 ANNEE_OUV Char 4.0 189.0 NaN

3 DATAMART.SINISTRES_AUTH 26.0 ANNEE_SURV Char 4.0 185.0 NaN

4 DATAMART.SINISTRES_AUTH 8.0 AN_RECOLTE Char 3.0 165.0 NaN

5 DATAMART.SINISTRES_AUTH 32.0 CHARGE Num 8.0 144.0 NaN

6 DATAMART.SINISTRES_AUTH 4.0 CR Char 3.0 162.0 NaN

7 DATAMART.SINISTRES_AUTH 15.0 DATEOBS Num 8.0 48.0 E8601DT26.6

8 DATAMART.SINISTRES_AUTH 7.0 DATE_CLOS Num 8.0 32.0 E8601DT26.6

9 DATAMART.SINISTRES_AUTH 5.0 DATE_OUV Num 8.0 16.0 E8601DT26.6

10 DATAMART.SINISTRES_AUTH 6.0 DATE_SURV Num 8.0 24.0 E8601DT26.6

11 DATAMART.SINISTRES_AUTH 2.0 DBT_VALIDITE Num 8.0 0.0 E8601DT26.6

12 DATAMART.SINISTRES_AUTH 11.0 ETATSIN Char 1.0 176.0 NaN

13 DATAMART.SINISTRES_AUTH 16.0 EVAL_BRUTE Num 8.0 56.0 NaN

14 DATAMART.SINISTRES_AUTH 18.0 EVAL_NET Num 8.0 72.0 NaN

15 DATAMART.SINISTRES_AUTH 17.0 EVAL_RECOURS Num 8.0 64.0 NaN

16 DATAMART.SINISTRES_AUTH 31.0 EVAL_REM Num 8.0 136.0 NaN

17 DATAMART.SINISTRES_AUTH 9.0 EVTNAT Char 5.0 168.0 NaN

18 DATAMART.SINISTRES_AUTH 3.0 FIN_VALIDITE Num 8.0 8.0 E8601DT26.6

19 DATAMART.SINISTRES_AUTH 25.0 GESTE_MONT Num 8.0 128.0 NaN

20 DATAMART.SINISTRES_AUTH 12.0 LIEU Char 5.0 177.0 NaN

21 DATAMART.SINISTRES_AUTH 13.0 PJ Num 8.0 40.0 NaN

22 DATAMART.SINISTRES_AUTH 30.0 PRODUIT Char 2.0 201.0 NaN

23 DATAMART.SINISTRES_AUTH 20.0 REG_FRAIS Num 8.0 88.0 NaN

24 DATAMART.SINISTRES_AUTH 21.0 REG_HONOR Num 8.0 96.0 NaN

25 DATAMART.SINISTRES_AUTH 23.0 REG_NET Num 8.0 112.0 NaN

26 DATAMART.SINISTRES_AUTH 19.0 REG_PRINC Num 8.0 80.0 NaN

27 DATAMART.SINISTRES_AUTH 22.0 REG_RECOURS Num 8.0 104.0 NaN

28 DATAMART.SINISTRES_AUTH 24.0 RES_NET Num 8.0 120.0 NaN

29 DATAMART.SINISTRES_AUTH 1.0 SIN Char 10.0 152.0 NaN

30 DATAMART.SINISTRES_AUTH 14.0 TOPMIG Char 3.0 182.0 NaN

31 DATAMART.SINISTRES_AUTH 10.0 TYPE_GES Char 3.0 173.0 NaN

sd.obs()

Out[19]: 1000

sd.contents(results='text')

                                                       The SAS System                       16:51 Thursday, October 13, 2022   3

                                                   The CONTENTS Procedure

                       Data Set Name        DATAMART.SINISTRES_AUTH       Observations           1000

                       Member Type          DATA                          Variables              32  

                       Engine               V9                            Indexes                0   

                       Created              13/10/2022 16:52:02           Observation Length     203 

                       Last Modified        13/10/2022 16:52:02           Deleted Observations   0   

                       Protection                                         Compressed             CHAR

                       Data Set Type                                      Reuse Space            NO  

                       Label                                              Point to Observations  YES 

                       Data Representation  WINDOWS_64                    Sorted                 NO  

                       Encoding             wlatin1  Western (Windows)                               

                                             Engine/Host Dependent Information

                 Data Set Page Size          65536                                                         

                 Number of Data Set Pages    4                                                             

                 Number of Data Set Repairs  0                                                             

                 ExtendObsCounter            YES                                                           

                 Filename                    H:\DataProj\DFICMS\02_Datamarts\202204\sinistres_auth.sas7bdat

                 Release Created             9.0401M7                                                      

                 Host Created                X64_SRV19                                                     

                 Owner Name                  CA-PACIFICA\aurelien.thiery                                   

                 File Size                   320KB                                                         

                 File Size (bytes)           327680                                                        

                                         Alphabetic List of Variables and Attributes

                                       #    Variable        Type    Len    Format

                                      28    ANNEE_CLOS      Char      4               

                                      29    ANNEE_OBS       Char      4               

                                      27    ANNEE_OUV       Char      4               

                                      26    ANNEE_SURV      Char      4               

                                       8    AN_RECOLTE      Char      3               

                                      32    CHARGE          Num       8               

                                       4    CR              Char      3               

                                      15    DATEOBS         Num       8    E8601DT26.6

                                       7    DATE_CLOS       Num       8    E8601DT26.6

                                       5    DATE_OUV        Num       8    E8601DT26.6

                                       6    DATE_SURV       Num       8    E8601DT26.6

                                       2    DBT_VALIDITE    Num       8    E8601DT26.6

                                      11    ETATSIN         Char      1               

                                      16    EVAL_BRUTE      Num       8               

                                      18    EVAL_NET        Num       8               

                                      17    EVAL_RECOURS    Num       8               

                                      31    EVAL_REM        Num       8               

                                       9    EVTNAT          Char      5               

                                       3    FIN_VALIDITE    Num       8    E8601DT26.6

                                      25    GESTE_MONT      Num       8               

                                      12    LIEU            Char      5               

                                      13    PJ              Num       8               

                                      30    PRODUIT         Char      2               

                                      20    REG_FRAIS       Num       8               

                                      21    REG_HONOR       Num       8               

                                      23    REG_NET         Num       8               

                                      19    REG_PRINC       Num       8               

                                      22    REG_RECOURS     Num       8               

                                      24    RES_NET         Num       8               

                                       1    SIN             Char     10               

                                      14    TOPMIG          Char      3               

                                      10    TYPE_GES        Char      3               

print(sas.saslog())

1 The SAS System 16:51 Thursday, October 13, 2022

NOTE: Copyright (c) 2016 by SAS Institute Inc., Cary, NC, USA.

NOTE: SAS (r) Proprietary Software 9.4 (TS1M7)

  Licensed to PACIFICA - SERVEUR DE DEVELOPPEMENT, Site 50110812.

NOTE: This session is executing on the X64_SRV19 platform.

NOTE: Analytical products:

  SAS/STAT 15.2

NOTE: Additional host information:

X64_SRV19 WIN 10.0.17763 Server

NOTE: SAS Initialization used (Total process time):

  real time           0.00 seconds

  cpu time            0.00 seconds

NOTE: The autoexec file, I:\SASConfig\Lev1\Infocentre\WorkspaceServer\autoexec.sas, was executed at server initialization.

1 ;';";*/;

2 options svgtitle='svgtitle'; options validvarname=any validmemname=extend pagesize=max nosyntaxcheck; ods graphics on;

3

4 ;';";*/;

5

2 The SAS System 16:51 Thursday, October 13, 2022

6

7 %put E3969440A681A2408885998500000001;

E3969440A681A2408885998500000001

8

3 The SAS System 16:51 Thursday, October 13, 2022

9 ;';";*/;

10 data null; length x $ 4096;

11 x = resolve('%sysfunc(pathname(work))'); put 'WORKPATH=' x 'WORKPATHEND=';

12 x = resolve('&SYSENCODING'); put 'ENCODING=' x 'ENCODINGEND=';

13 x = resolve('&SYSVLONG4'); put 'SYSVLONG=' x 'SYSVLONGEND=';

14 x = resolve('&SYSJOBID'); put 'SYSJOBID=' x 'SYSJOBIDEND=';

15 x = resolve('&SYSSCP'); put 'SYSSCP=' x 'SYSSCPEND=';

16 run;

WORKPATH=W:\WorkSAS_TD23084S00SAS017\Prc2 WORKPATHEND=

ENCODING=wlatin1 ENCODINGEND=

SYSVLONG=9.04.01M7P08052020 SYSVLONGEND=

SYSJOBID=23084 SYSJOBIDEND=

SYSSCP=WIN SYSSCPEND=

NOTE: DATA statement used (Total process time):

  real time           0.00 seconds

  cpu time            0.01 seconds

17

18

19 ;';";*/;

20

4 The SAS System 16:51 Thursday, October 13, 2022

21

22 %put E3969440A681A2408885998500000002;

E3969440A681A2408885998500000002

23

5 The SAS System 16:51 Thursday, October 13, 2022

24 ;';";*/;

25 libname datamart 'H:\DataProj\DFICMS\02_Datamarts\202204' ;

NOTE: Libref DATAMART was successfully assigned as follows:

  Engine:        V9 

  Physical Name: H:\DataProj\DFICMS\02_Datamarts\202204

26

27 ;';";*/;

28

6 The SAS System 16:51 Thursday, October 13, 2022

29

30 %put E3969440A681A2408885998500000003;

E3969440A681A2408885998500000003

31

7 The SAS System 16:51 Thursday, October 13, 2022

32 ;';";*/;

33

34 data null; retain libref; retain cobs 1;

35 set sashelp.vlibnam end=last;

36 if cobs EQ 1 then

37 put "LIBREFSSTART=";

38 cobs = 2;

39 if libref NE libname then

40 put %upcase("lib=") libname %upcase('libEND=');

41 libref = libname;

42 if last then

43 put "LIBREFSEND=";

44 run;

LIBREFSSTART=

LIB=WORK LIBEND=

LIB=DATAMART LIBEND=

LIB=SASDATA LIBEND=

LIB=O21_0822 LIBEND=

LIB=O22_0822 LIBEND=

LIB=REL0822 LIBEND=

LIB=O12_0122 LIBEND=

LIB=O13_0122 LIBEND=

LIB=O14_0122 LIBEND=

LIB=O15_0122 LIBEND=

LIB=O16_0122 LIBEND=

LIB=O17_0122 LIBEND=

LIB=O18_0122 LIBEND=

LIB=O19_0122 LIBEND=

LIB=O20_0122 LIBEND=

LIB=O21_0122 LIBEND=

LIB=DGT0622 LIBEND=

LIB=IFRHIS4 LIBEND=

LIB=IFRHIS LIBEND=

LIB=IFRACT LIBEND=

LIB=O11_0121 LIBEND=

LIB=O12_0121 LIBEND=

LIB=O13_0121 LIBEND=

LIB=O14_0121 LIBEND=

LIB=O15_0121 LIBEND=

LIB=O16_0121 LIBEND=

LIB=O17_0121 LIBEND=

LIB=O18_0121 LIBEND=

LIB=O19_0121 LIBEND=

LIB=TEST LIBEND=

LIB=OSA2020 LIBEND=

LIB=OSA2021 LIBEND=

LIB=NBO2022 LIBEND=

LIB=GEO1221 LIBEND=

LIB=O20_0821 LIBEND=

LIB=O21_0821 LIBEND=

LIB=OSADTM LIBEND=

LIB=PCRM2022 LIBEND=

LIB=O20_0121 LIBEND=

LIB=PCRH2021 LIBEND=

LIB=PCRH2022 LIBEND=

LIB=PCRM2021 LIBEND=

LIB=SIMWRK LIBEND=

LIB=SM9WRK LIBEND=

LIB=SVUWRK LIBEND=

LIB=SSTWRK LIBEND=

LIB=SNHWRK LIBEND=

LIB=SA4WRK LIBEND=

LIB=SA2WRK LIBEND=

LIB=SAP0821 LIBEND=

LIB=SAP0721 LIBEND=

LIB=STPSAMP LIBEND=

LIB=SASHELP LIBEND=

LIB=MAPS LIBEND=

LIB=MAPSSAS LIBEND=

LIB=MAPSGFK LIBEND=

LIB=SASUSER LIBEND=

LIBREFSEND=

NOTE: There were 952 observations read from the data set SASHELP.VLIBNAM.

NOTE: DATA statement used (Total process time):

  real time           0.13 seconds

  cpu time            0.03 seconds

45

46

47 ;';";*/;

48

8 The SAS System 16:51 Thursday, October 13, 2022

49

50 %put E3969440A681A2408885998500000004;

E3969440A681A2408885998500000004

51

9 The SAS System 16:51 Thursday, October 13, 2022

52 data datamart.'sinistres_auth'n;

53 length 'SIN'n $10 'DBT_VALIDITE'n 8 'FIN_VALIDITE'n 8 'CR'n $3 'DATE_OUV'n 8 'DATE_SURV'n 8 'DATE_CLOS'n 8 'AN_RECOLTE'n

53 ! $3 'EVTNAT'n $5 'TYPE_GES'n $3 'ETATSIN'n $1 'LIEU'n $5 'PJ'n 8 'TOPMIG'n $3 'DATEOBS'n 8 'EVAL_BRUTE'n 8

53 ! 'EVAL_RECOURS'n 8 'EVAL_NET'n 8 'REG_PRINC'n 8 'REG_FRAIS'n 8 'REG_HONOR'n 8 'REG_RECOURS'n 8 'REG_NET'n 8 'RES_NET'n 8

53 ! 'GESTE_MONT'n 8 'ANNEE_SURV'n $4 'ANNEE_OUV'n $4 'ANNEE_CLOS'n $4 'ANNEE_OBS'n $4 'PRODUIT'n $2 'EVAL_REM'n 8 'CHARGE'n 8

53 ! ;

54 format 'DBT_VALIDITE'n E8601DT26.6 'FIN_VALIDITE'n E8601DT26.6 'DATE_OUV'n E8601DT26.6 'DATE_SURV'n E8601DT26.6

54 ! 'DATE_CLOS'n E8601DT26.6 'DATEOBS'n E8601DT26.6 ;

55 infile datalines delimiter='03'x STOPOVER;

56 input @;

57 if infile = '' then delete;

58 else do;

59 input 'SIN'n ;

60 input 'DBT_VALIDITE'n :B8601DT26.6 ;

61 input 'FIN_VALIDITE'n :B8601DT26.6 ;

62 input 'CR'n ;

63 input 'DATE_OUV'n :B8601DT26.6 ;

64 input 'DATE_SURV'n :B8601DT26.6 ;

65 input 'DATE_CLOS'n :B8601DT26.6 ;

66 input 'AN_RECOLTE'n ;

67 input 'EVTNAT'n ;

68 input 'TYPE_GES'n ;

69 input 'ETATSIN'n ;

70 input 'LIEU'n ;

71 input 'PJ'n ;

72 input 'TOPMIG'n ;

73 input 'DATEOBS'n :B8601DT26.6 ;

74 input 'EVAL_BRUTE'n ;

75 input 'EVAL_RECOURS'n ;

76 input 'EVAL_NET'n ;

77 input 'REG_PRINC'n ;

78 input 'REG_FRAIS'n ;

79 input 'REG_HONOR'n ;

80 input 'REG_RECOURS'n ;

81 input 'REG_NET'n ;

82 input 'RES_NET'n ;

83 input 'GESTE_MONT'n ;

84 input 'ANNEE_SURV'n ;

85 input 'ANNEE_OUV'n ;

86 input 'ANNEE_CLOS'n ;

87 input 'ANNEE_OBS'n ;

88 input 'PRODUIT'n ;

89 input 'EVAL_REM'n ;

90 input 'CHARGE'n ;

91 'SIN'n = translate('SIN'n, '0a'X, '01'x);

92 'SIN'n = translate('SIN'n, '0D'x, '02'x );

93 'CR'n = translate('CR'n, '0a'X, '01'x);

94 'CR'n = translate('CR'n, '0D'x, '02'x );

95 'AN_RECOLTE'n = translate('AN_RECOLTE'n, '0a'X, '01'x);

96 'AN_RECOLTE'n = translate('AN_RECOLTE'n, '0D'x, '02'x );

97 'EVTNAT'n = translate('EVTNAT'n, '0a'X, '01'x);

98 'EVTNAT'n = translate('EVTNAT'n, '0D'x, '02'x );

99 'TYPE_GES'n = translate('TYPE_GES'n, '0a'X, '01'x);

100 'TYPE_GES'n = translate('TYPE_GES'n, '0D'x, '02'x );

101 'ETATSIN'n = translate('ETATSIN'n, '0a'X, '01'x);

102 'ETATSIN'n = translate('ETATSIN'n, '0D'x, '02'x );

103 'LIEU'n = translate('LIEU'n, '0a'X, '01'x);

104 'LIEU'n = translate('LIEU'n, '0D'x, '02'x );

105 'TOPMIG'n = translate('TOPMIG'n, '0a'X, '01'x);

106 'TOPMIG'n = translate('TOPMIG'n, '0D'x, '02'x );

107 'ANNEE_SURV'n = translate('ANNEE_SURV'n, '0a'X, '01'x);

108 'ANNEE_SURV'n = translate('ANNEE_SURV'n, '0D'x, '02'x );

109 'ANNEE_OUV'n = translate('ANNEE_OUV'n, '0a'X, '01'x);

110 'ANNEE_OUV'n = translate('ANNEE_OUV'n, '0D'x, '02'x );

111 'ANNEE_CLOS'n = translate('ANNEE_CLOS'n, '0a'X, '01'x);

112 'ANNEE_CLOS'n = translate('ANNEE_CLOS'n, '0D'x, '02'x );

113 'ANNEE_OBS'n = translate('ANNEE_OBS'n, '0a'X, '01'x);

114 'ANNEE_OBS'n = translate('ANNEE_OBS'n, '0D'x, '02'x );

115 'PRODUIT'n = translate('PRODUIT'n, '0a'X, '01'x);

116 'PRODUIT'n = translate('PRODUIT'n, '0D'x, '02'x );

117 ;

118 end;

119 datalines4;

10 The SAS System 16:51 Thursday, October 13, 2022

NOTE: DATA statement used (Total process time):

  real time           0.16 seconds

  cpu time            0.14 seconds

NOTE: The data set DATAMART.SINISTRES_AUTH has 1000 observations and 32 variables.

NOTE: Compressing data set DATAMART.SINISTRES_AUTH decreased size by 0.00 percent.

  Compressed is 4 pages; un-compressed would require 4 pages.

32139 ;;;;

32140 ;;;;

32141

11 The SAS System 16:51 Thursday, October 13, 2022

32142 ;';";*/;

32143 quit;

32144

32145 ;';";*/;

32146

12 The SAS System 16:51 Thursday, October 13, 2022

32147

32148 %put E3969440A681A2408885998500000005;

E3969440A681A2408885998500000005

32149

13 The SAS System 16:51 Thursday, October 13, 2022

32150 ;';";*/;

32151 data null; e = exist("datamart.'sinistres_auth'n");

32152 v = exist("datamart.'sinistres_auth'n", 'VIEW');

32153 if e or v then e = 1;

32154 te='TABLE_EXISTS='; put te e;run;

TABLE_EXISTS= 1

NOTE: DATA statement used (Total process time):

  real time           0.00 seconds

  cpu time            0.00 seconds

32155

32156

32157 ;';";*/;

32158

14 The SAS System 16:51 Thursday, October 13, 2022

32159

32160 %put E3969440A681A2408885998500000006;

E3969440A681A2408885998500000006

32161

15 The SAS System 16:51 Thursday, October 13, 2022

32162 ;';";*/;

32163 %let engine=BAD;

32164 proc sql;

32164 ! select distinct engine into :engine from sashelp.VLIBNAM where libname = 'DATAMART';

32164 ! ;%put engstart=&engine

32164 ! engend=;

engstart=V9 engend=

32165 quit;

NOTE: The PROCEDURE SQL printed page 1.

NOTE: PROCEDURE SQL used (Total process time):

  real time           0.01 seconds

  cpu time            0.03 seconds

32166

32167 ;';";*/;

32168

16 The SAS System 16:51 Thursday, October 13, 2022

32169

32170 %put E3969440A681A2408885998500000007;

E3969440A681A2408885998500000007

32171

17 The SAS System 16:51 Thursday, October 13, 2022

32172 ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg style=HTMLBlue;

32172 ! ods graphics on / outputfmt=png;

NOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1

32173 ;';";*/;

32174 proc contents data=datamart.'sinistres_auth'n ;ods output Variables=work._variables ;run;

NOTE: PROCEDURE CONTENTS used (Total process time):

  real time           0.04 seconds

  cpu time            0.03 seconds

NOTE: The data set WORK._VARIABLES has 32 observations and 7 variables.

NOTE: Compressing data set WORK._VARIABLES increased size by 100.00 percent.

  Compressed is 2 pages; un-compressed would require 1 pages.

32175

32176 ;';";*/;

32177 ods html5 (id=saspy_internal) close;ods listing;

32178

18 The SAS System 16:51 Thursday, October 13, 2022

32179

32180 %put E3969440A681A2408885998500000008;

E3969440A681A2408885998500000008

32181

19 The SAS System 16:51 Thursday, October 13, 2022

32182 ;';";*/;

32183 data null; e = exist("work.'_variables'n");

32184 v = exist("work.'_variables'n", 'VIEW');

32185 if e or v then e = 1;

32186 te='TABLE_EXISTS='; put te e;run;

TABLE_EXISTS= 1

NOTE: DATA statement used (Total process time):

  real time           0.00 seconds

  cpu time            0.00 seconds

32187

32188

32189 ;';";*/;

32190

20 The SAS System 16:51 Thursday, October 13, 2022

32191

32192 %put E3969440A681A2408885998500000009;

E3969440A681A2408885998500000009

32193

21 The SAS System 16:51 Thursday, October 13, 2022

32194 ;';";*/;

32195 data work.sasdata2dataframe / view=work.sasdata2dataframe; set work.'_variables'n ;run;

NOTE: DATA STEP view saved on file WORK.SASDATA2DATAFRAME.

NOTE: A stored DATA STEP view cannot run under a different operating system.

NOTE: DATA statement used (Total process time):

  real time           0.00 seconds

  cpu time            0.00 seconds

32196 data null; file LOG; d = open('work.sasdata2dataframe');

32197 length var $256;

32198 lrecl = attrn(d, 'LRECL'); nvars = attrn(d, 'NVARS');

32199 lr='LRECL='; vn='VARNUMS='; vl='VARLIST='; vt='VARTYPE=';

32200 put lr lrecl; put vn nvars; put vl;

32201 do i = 1 to nvars; var = compress(varname(d, i), '00'x); put var; end;

32202 put vt;

32203 do i = 1 to nvars; var = vartype(d, i); put var; end;

32204 run;

LRECL= 327

VARNUMS= 7

VARLIST=

Member

Num

Variable

Type

Len

Pos

Format

VARTYPE=

C

N

C

C

N

N

C

NOTE: View WORK.SASDATA2DATAFRAME.VIEW used (Total process time):

  real time           0.00 seconds

  cpu time            0.00 seconds

NOTE: DATA statement used (Total process time):

  real time           0.00 seconds

  cpu time            0.00 seconds

32205

32206 ;';";*/;

32207

22 The SAS System 16:51 Thursday, October 13, 2022

32208

32209 %put E3969440A681A2408885998500000011;

E3969440A681A2408885998500000011

32210

23 The SAS System 16:51 Thursday, October 13, 2022

32211 ;';";*/;

32212 proc delete data=work.sasdata2dataframe(memtype=view);run;

NOTE: Deleting WORK.SASDATA2DATAFRAME (memtype=VIEW).

NOTE: PROCEDURE DELETE used (Total process time):

  real time           0.00 seconds

  cpu time            0.00 seconds

32213 data work._n_u_ll;output;run;

NOTE: Compression was disabled for data set WORK._N_U_LL because compression overhead would increase the size of the data set.

NOTE: The data set WORK._N_U_LL has 1 observations and 0 variables.

NOTE: DATA statement used (Total process time):

  real time           0.00 seconds

  cpu time            0.00 seconds

32214 data null; set work._n_u_ll work.'_variables'n ;put 'FMT_CATS=';

32215 _tom = vformatn('Member'n);put _tom;

32216 _tom = vformatn('Num'n);put _tom;

32217 _tom = vformatn('Variable'n);put _tom;

32218 _tom = vformatn('Type'n);put _tom;

32219 _tom = vformatn('Len'n);put _tom;

32220 _tom = vformatn('Pos'n);put _tom;

32221 _tom = vformatn('Format'n);put _tom;

32222 stop;

32223 run;

FMT_CATS=

$

F

$

$

F

BEST

$

NOTE: There were 1 observations read from the data set WORK._N_U_LL.

NOTE: DATA statement used (Total process time):

  real time           0.00 seconds

  cpu time            0.01 seconds

32224 proc delete data=work._n_u_ll;run;

NOTE: Deleting WORK._N_U_LL (memtype=DATA).

NOTE: PROCEDURE DELETE used (Total process time):

  real time           0.00 seconds

  cpu time            0.00 seconds

32225

32226 ;';";*/;

32227

24 The SAS System 16:51 Thursday, October 13, 2022

32228

32229 %put E3969440A681A2408885998500000012;

E3969440A681A2408885998500000012

32230

25 The SAS System 16:51 Thursday, October 13, 2022

32231 data null; set work.'_variables'n ;

32232 format 'Num'n best32.; format 'Len'n best32.; format 'Pos'n best32.;

32233 file _tomods1 lrecl=1048576 dlm='02'x recfm=v termstr=NL encoding='utf-8';

32234 'Member'n = translate('Member'n, '2020'x, '0102'x); if missing('Num'n) then 'Num'n = .; 'Variable'n =

32234 ! translate('Variable'n, '2020'x, '0102'x); 'Type'n = translate('Type'n, '2020'x, '0102'x); if missing('Len'n) then 'Len'n

32234 ! = .; if missing('Pos'n) then 'Pos'n = .; 'Format'n = translate('Format'n, '2020'x, '0102'x);

32235 put 'Member'n 'Num'n 'Variable'n 'Type'n 'Len'n 'Pos'n 'Format'n '01'x;

32236 run;

NOTE: The file _TOMODS1 is:

  Filename=W:\WorkSAS\_TD23084_S00SAS017_\Prc2\_tomods1,

  RECFM=V,LRECL=4194304,File Size (bytes)=0,

  Last Modified=13 octobre 2022 16 h 53,

  Create Time=13 octobre 2022 16 h 53

NOTE: 32 records were written to the file _TOMODS1.

  The minimum record length was 42.

  The maximum record length was 60.

NOTE: There were 32 observations read from the data set WORK._VARIABLES.

NOTE: DATA statement used (Total process time):

  real time           0.00 seconds

  cpu time            0.00 seconds

32237

26 The SAS System 16:51 Thursday, October 13, 2022

32238

32239

27 The SAS System 16:51 Thursday, October 13, 2022

32240

32241 %put E3969440A681A2408885998500000010;

E3969440A681A2408885998500000010

32242

28 The SAS System 16:51 Thursday, October 13, 2022

32243 ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg style=HTMLBlue;

32243 ! ods graphics on / outputfmt=png;

NOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1

32244 ;';";*/;

32245 proc delete data=work._variables; run;

NOTE: Deleting WORK._VARIABLES (memtype=DATA).

NOTE: PROCEDURE DELETE used (Total process time):

  real time           0.00 seconds

  cpu time            0.00 seconds

32246

32247 ;';";*/;

32248 ods html5 (id=saspy_internal) close;ods listing;

32249

29 The SAS System 16:51 Thursday, October 13, 2022

32250

32251 %put E3969440A681A2408885998500000013;

E3969440A681A2408885998500000013

32252

30 The SAS System 16:51 Thursday, October 13, 2022

32253 ;';";*/;

32254 data null; e = exist("datamart.'sinistres_auth'n");

32255 v = exist("datamart.'sinistres_auth'n", 'VIEW');

32256 if e or v then e = 1;

32257 te='TABLE_EXISTS='; put te e;run;

TABLE_EXISTS= 1

NOTE: DATA statement used (Total process time):

  real time           0.00 seconds

  cpu time            0.00 seconds

32258

32259

32260 ;';";*/;

32261

31 The SAS System 16:51 Thursday, October 13, 2022

32262

32263 %put E3969440A681A2408885998500000014;

E3969440A681A2408885998500000014

32264

32 The SAS System 16:51 Thursday, October 13, 2022

32265 ;';";*/;

32266 %let lastobs=-1;

32267 proc sql;

32267 ! select count(*) format best32. into :lastobs from datamart.'sinistres_auth'n ;

32267 ! %put lastobs=&lastobs lastobsend=;

lastobs= 1000 lastobsend=

32268 quit;

NOTE: The PROCEDURE SQL printed page 2.

NOTE: PROCEDURE SQL used (Total process time):

  real time           0.00 seconds

  cpu time            0.00 seconds

32269

32270 ;';";*/;

32271

33 The SAS System 16:51 Thursday, October 13, 2022

32272

32273 %put E3969440A681A2408885998500000015;

E3969440A681A2408885998500000015

32274

34 The SAS System 16:51 Thursday, October 13, 2022

32275 ;';";*/;

32276 data null; e = exist("datamart.'sinistres_auth'n");

32277 v = exist("datamart.'sinistres_auth'n", 'VIEW');

32278 if e or v then e = 1;

32279 te='TABLE_EXISTS='; put te e;run;

TABLE_EXISTS= 1

NOTE: DATA statement used (Total process time):

  real time           0.00 seconds

  cpu time            0.00 seconds

32280

32281

32282 ;';";*/;

32283

35 The SAS System 16:51 Thursday, October 13, 2022

32284

32285 %put E3969440A681A2408885998500000016;

E3969440A681A2408885998500000016

32286

36 The SAS System 16:51 Thursday, October 13, 2022

32287 ;';";*/;

32288 proc contents data=datamart.'sinistres_auth'n ;run;

NOTE: PROCEDURE CONTENTS used (Total process time):

  real time           0.00 seconds

  cpu time            0.01 seconds

NOTE: The PROCEDURE CONTENTS printed page 3.

32289

32290 ;';";*/;

32291

37 The SAS System 16:51 Thursday, October 13, 2022

32292

32293 %put E3969440A681A2408885998500000017;

E3969440A681A2408885998500000017

32294

Then i executed this script as you want :

import pandas as pd

import saspy

import time

print('Chargement du dataframe...')

df = pd.read_pickle("c:\temp\sinis.pkl")

print('Ouverture de la session SAS sur le serveur...')

sas = saspy.SASsession(cfgfile='C:\Users\aurelien.thiery\PycharmProjects\sascfg_personal.py')

start = time.time()

cl = sas.df_char_lengths(df)

print('time= ', time.time()-start)

print(cl)

And the result :

Chargement du dataframe...

Ouverture de la session SAS sur le serveur...

Using SAS Config named: winiomIWA

SAS Connection established. Subprocess id is 17412

time= 70.60996747016907

{'SIN': 10, 'CR': 3, 'AN_RECOLTE': 4, 'EVTNAT': 5, 'TYPE_GES': 3, 'ETATSIN': 1, 'LIEU': 5, 'TOPMIG': 3, 'ANNEE_SURV': 4, 'ANNEE_OUV': 4, 'ANNEE_CLOS': 6, 'ANNEE_OBS': 4, 'PRODUIT': 2}

Aurélien

De : Tom Weber @.> Envoyé : jeudi 13 octobre 2022 16:08 À : sassoftware/saspy @.> Cc : THIERY Aurelien @.>; Author @.> Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)

I don't see any of the output for the dataframe info. Can you just run python interactively and submit all the code, then I can actually see what code was executed and get all of the output.

python

import pandas as pd

import saspy

print('Chargement du dataframe...')

df = pd.read_pickle("c:\temp\sinis.pkl")

print('NB lignes dataframe:', len(df.index))

print('Ouverture de la session SAS sur le serveur...')

sas = saspy.SASsession(cfgfile='C:\Users\aurelien.thiery\PycharmProjects\sascfg_personal.py')

print(sas)

print('Allocation de la bibliothèque...')

sas.saslib(libref='datamart', path='H:\DataProj\DFICMS\02_Datamarts\202204')

print('Conversion du dataframe en table SAS...')

sd = sas.df2sd(df[:1000], 'sinistres_auth', 'datamart')

print(sas.lastlog())

df.shape

df.info()

sas.df_char_lengths(df[:1000])

sd.columnInfo()

df.shape

sd.obs()

sd.contents(results='text')

print(sas.saslog())

Also, I'd be curious to know the following (on the full df):

import time

start = time.time()

cl = sas.df_char_lengths(df)

print('time= ', time.time()-start)

print(cl)

Thanks! Tom

— Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1277672844 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KAPJ24EWKYMVCYHHGTWDAJUJANCNFSM6AAAAAARCNQKGQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AD5A4KA7Q3K3HBWMDRWN4E3WDAJUJA5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSME66YY.gif Message ID: @. @.> >

tomweber-sas commented 1 year ago

That's great, thanks! It all shows up that way. Also, good news on my side; I was able to resize my linux development machine (it's virtual like everything else these days), which is good on all accounts. So, I was just able to load that pickle file and run the code here too! I'm in the middle of running df2sd() now. It's not the same as yours since I'm on linux and I'm actually using IOM Remote, but it's the same code, for the most part, and I'm seeing what I kinda expected. My SAS process is still only around 70M of memory, but that's probably because it's only set for the default memsize. I expect in your case that since you set memsize way higher, SAS is simply using it even though there's no requirement; it's streaming the data to disk, not storing it all in memory. But if you give SAS lots of mem, it tries to use it for things, so I think your 30g is 'just because it can', as there's no other reason it would be using that memory. Just a data step streaming the data to disk.

Your sas.df_char_lengths(df) only took 70 seconds, while on my side it took 215 (got the same results too, as it should!). So, that's good on your side!

>>> start = time.time()
>>> cl = sas.df_char_lengths(df)
>>> print('time= ', time.time()-start)
time=  215.42656183242798
>>> print(cl)
{'SIN': 10, 'CR': 3, 'AN_RECOLTE': 4, 'EVTNAT': 5, 'TYPE_GES': 3, 'ETATSIN': 1, 'LIEU': 7, 'TOPMIG': 3, 'ANNEE_SURV': 4, 'ANNEE_OUV': 4, 'ANNEE_CLOS': 6, 'ANNEE_OBS': 4, 'PRODUIT': 2}
>>>

Here's my status as I'm still in the middle of running this:

image

tomweber-sas commented 1 year ago

Well, so my df2sd() finished, at least it seemed so from both systems; python client machine and the SAS server machine. But, I never got control back in my python process; seemed saspy hung at the (presumably) end of the df2df step. So, maybe that's what you were seeing too. I'll have to investigate this and see what's happening. Hopefully, I just reproduced what you saw and when I figure out what happened, I'll be able to get it working for your case as well.

So, that's good, at least. I'll need to spend some time tracking this down. I'll post when after I've had a chance to track this. Doesn't help it takes a long time to run, but it seems to be something to do with transferring this volume of data.

Thanks! Tom

tomweber-sas commented 1 year ago

Ok, I believe I've reproduced and captured the failure. though it makes no sense as of yet. I need to do some more investigation to understand why. It's actually a problem on the SAS side and it accounts for both the excessive memory (which it runs out of) and the hang back in saspy, as it disconnects the client, but my client doesn't happen to catch that and fail. I need to see why, but probably waiting for output that never comes or something like that where I'm not getting a failure from an IOM client call. Either way, I believe it's the same problem you're running into. So, that's progress. More to come ...

AurelienThiery commented 1 year ago

Ok thanks for the news.AurélienLe 13 oct. 2022 à 22:08, Tom Weber @.***> a écrit : Ok, I believe I've reproduced and captured the failure. though it makes no sense as of yet. I need to do some more investigation to understand why. It's actually a problem on the SAS side and it accounts for both the excessive memory (which it runs out of) and the hang back in saspy, as it disconnects the client, but my client doesn't happen to catch that and fail. I need to see why, but probably waiting for output that never comes or something like that where I'm not getting a failure from an IOM client call. Either way, I believe it's the same problem you're running into. So, that's progress. More to come ...

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

tomweber-sas commented 1 year ago

Ok, sorry it's after 5 on Friday for you, but I have figured this out and fixed it at main. Had to run tests and validate everything and dealing with that huge dataframe made things take longer too, for the initial diagnosis. Hopefully you can pull from there and try it out:

pip uninstall -y saspy
pip install git+https://git@github.com/sassoftware/saspy.git

The failures on the SAS server yesterday had me wondering, because after 5.8 million rows SAS started throwing errors for every one of the next 12 million rows, complaining about

2022-10-13T14:52:25,796 ERROR [00000009] :sastpw - ERROR: Maximum level of nesting of macro functions exceeded.

And I have no macros in the datastep I generated and ran. So I was really confused by that. But, that showed why the memory was being consumed; till it ran out and disconnected the client. Since it's IOM, it's buffering the log to be returned via the api (I'm guessing that's why it ran out of memory; seems like). So, I still need to debug my java IOM client code to see why it never got the disconnect, but, I'll worry about that next week. When it works; that never happens :)

What I finally found is that in the LIEU column, which is mostly numeric (MOSTLY), yet it's an object (so string), there are a few rows where that column has the following string: ';;;;;' or ';;;;'. The first occurrence of that is in row 5847832! What was happening is that my data step for streaming the data over uses 'infile datalines ...' and 'datalines4;' sending the data straight into SAS over stdin (inline data). ';;;;' is the string that terminates the data input for a data step specifying datalines4. So, after shy of 6million rows, the datastep terminated (cleanly - the table was good, but just 5.8 million rows), and the other 12 million rows were treated as more SAS code being processed by SAS. Obviously, all of it was wrong and not SAS code so it generated a ton or error output to the log and didn't really make sense.

I've handled this issue in df2sd() and have pushed it to main. If you can pull that and validate on your end that this fixes it for your test case (don't want to just say it worked for me), that would be great.

Thanks! Tom

AurelienThiery commented 1 year ago

Hi Tom,

Thanks !

But i can’t access github site (enterprise proxy block it).

Aurélien

De : Tom Weber @.> Envoyé : vendredi 14 octobre 2022 21:46 À : sassoftware/saspy @.> Cc : THIERY Aurelien @.>; Author @.> Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)

Ok, sorry it's after 5 on Friday for you, but I have figured this out and fixed it at main. Had to run tests and validate everything and dealing with that huge dataframe made things take longer too, for the initial diagnosis. Hopefully you can pull from there and try it out:

pip uninstall -y saspy pip install @.***/sassoftware/saspy.git

The failures on the SAS server yesterday had me wondering, because after 5.8 million rows SAS started throwing errors for every one of the next 12 million rows, complaining about

2022-10-13T14:52:25,796 ERROR [00000009] :sastpw - ERROR: Maximum level of nesting of macro functions exceeded.

And I have no macros in the datastep I generated and ran. So I was really confused by that. But, that showed why the memory was being consumed; till it ran out and disconnected the client. Since it's IOM, it's buffering the log to be returned via the api (I'm guessing that's why it ran out of memory; seems like). So, I still need to debug my java IOM client code to see why it never got the disconnect, but, I'll worry about that next week. When it works; that never happens :)

What I finally found is that in the LIEU column, which is mostly numeric (MOSTLY), yet it's an object (so string), there are a few rows where that column has the following string: ';;;;;' or ';;;;'. The first occurrence of that is in row 5847832! What was happening is that my data step for streaming the data over uses 'infile datalines ...' and 'datalines4;' sending the data straight into SAS over stdin (inline data). ';;;;' is the string that terminates the data input for a data step specifying datalines4. So, after shy of 6million rows, the datastep terminated (cleanly - the table was good, but just 5.8 million rows), and the other 12 million rows were treated as more SAS code being processed by SAS. Obviously, all of it was wrong and not SAS code so it generated a ton or error output to the log and didn't really make sense.

I've handled this issue in df2sd() and have pushed it to main. If you can pull that and validate on your end that this fixes it for your test case (don't want to just say it worked for me), that would be great.

Thanks! Tom

— Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1279393723 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KBO4WLH5SBARWMV3Q3WDGZ7NANCNFSM6AAAAAARCNQKGQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AD5A4KAG5ATNEHRVIPJDO3TWDGZ7NA5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMIH73W.gif Message ID: < @.> @.>

tomweber-sas commented 1 year ago

@AurelienThiery I sent the zip of the 'main' branch for you to download; you can then pip install from it; same as if you had access to github. This will be in the next release I publish, but it would be nice to verify it worked on your end, even though I'm sure it will. If you have to have a production version (what you get with just: pip install) I can build that next release with this. Just let me know. Thanks, Tom

AurelienThiery commented 1 year ago

Hi Tom,Ok thanks. I’m in training today but i can test the package tonight (this afternoon for you).AurélienLe 17 oct. 2022 à 04:20, Tom Weber @.***> a écrit : @AurelienThiery I sent the zip of the 'main' branch for you to download; you can then pip install from it; same as if you had access to github. This will be in the next release I publish, but it would be nice to verify it worked on your end, even though I'm sure it will. If you have to have a production version (what you get with just: pip install) I can build that next release with this. Just let me know. Thanks, Tom

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

AurelienThiery commented 1 year ago

Tom,

I was able to test the new package and it works fine, the table contains all data (18 437 883 lines) :

And the SAS session closed itself. Thanks !!

Other subject : last friday i tested the reading of « big » SAS table (6 GB on the disk : 7 905 149 lines, 281 columns) on the linux server with docker :

import pandas as pd

import saspy

import time

print('Ouverture de la session SAS sur le serveur...')

sas = saspy.SASsession(cfgfile='/opt/sascfg_personal.py')

print('Allocation de la bibliothèque...')

sas.saslib(libref='mylib', path='H:\my_data')

print('Chargement de la table SAS dans un dataframe...')

start = time.time()

sd = sas.sasdata('mvtpra4', 'mylib')

df = sd.to_df()

print('Temps de chargement = ', time.time()-start)

print(df.head())

The step « df = sd.to_df() » (to convert sas data in pandas dataframe object) finished in error and all server RAM was full !! (64GB)

I will remake the test when you will publish the new package on pypi repository.

Aurélien

De : Tom Weber @.> Envoyé : lundi 17 octobre 2022 04:21 À : sassoftware/saspy @.> Cc : THIERY Aurelien @.>; Mention @.> Objet : Re: [sassoftware/saspy] Convert dataframe to dataset (Issue #487)

@AurelienThiery https://github.com/AurelienThiery I sent the zip of the 'main' branch for you to download; you can then pip install from it; same as if you had access to github. This will be in the next release I publish, but it would be nice to verify it worked on your end, even though I'm sure it will. If you have to have a production version (what you get with just: pip install) I can build that next release with this. Just let me know. Thanks, Tom

— Reply to this email directly, view it on GitHub https://github.com/sassoftware/saspy/issues/487#issuecomment-1280182998 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5A4KH4K27BPZHIKEG7FYLWDSZXRANCNFSM6AAAAAARCNQKGQ . You are receiving this because you were mentioned. https://github.com/notifications/beacon/AD5A4KHFNV7U6Q4KKVQBORTWDSZXRA5CNFSM6AAAAAARCNQKGSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMJYFNM.gif Message ID: @. @.> >

tomweber-sas commented 1 year ago

Great to hear! I just published the new release; V4.3.5, that had this fix in it. So it's available now on Pypi; by tomorrow on Conda. I was also able to adjust the java IOM client code to catch this instead of hanging, so, although this won' t happen again, if some other thing like the sever terminating unexpectedly happens while streaming all this data, it should terminate and return to python with an exception. I'll close this one, but if you still have another problem, just open a new one for it! Thanks! Tom