sagemath / sage

Main repository of SageMath. Now open for Issues and Pull Requests.
https://www.sagemath.org
Other
1.21k stars 419 forks source link

command to gather build report on a platform/hardware combination #8048

Open 7c09a680-e216-4024-bb8e-9bfd4aa7f313 opened 14 years ago

7c09a680-e216-4024-bb8e-9bfd4aa7f313 commented 14 years ago

From sage-release:

>> I'm not saying that this wiki page is not a useful resource, but I don't
>> have the time or patience to go through this at each release.

> I'm too am frustrated by the wiki at times. Sometimes it won't allow
> me to save edits containing some strings, because it deemed those
> strings to be a sign of spam content or something. I would then spend
> minutes figuring out how to get the wiki to accept the edits or
> rewrite my edits.

> Anyway, if you want, you could send information according to the
> template below to sage-release and I'd update the wiki with the
> information you supply.

> OS version:
> Machine name:
> Architecture:
> 32/64 bit:
> RAM:
> Compiler:
> Build:
> Doctest:

Maybe you should add a command to sage, e.g.,

  sage: _build_report()

or

  sage -buildreport

that runs a script, gathers relevant information, then submits it
somewhere.  The resulting submission could then be summarized on a web
page.   This is a perfect example of where it would be far, far better
to spend time writing some code than doing something manually.

William

CC: @saliola @sagetrac-jasonbhill

Component: misc

Issue created by migration from https://trac.sagemath.org/ticket/8048

7c09a680-e216-4024-bb8e-9bfd4aa7f313 commented 14 years ago

Attachment: trac_8048-report-command.patch.gz

half-baked solution; based on Sage 4.3.2.rc0; don't use

7c09a680-e216-4024-bb8e-9bfd4aa7f313 commented 14 years ago
comment:1

I have attached a half-baked patch, which is not ready for review. I leave it here so I, or anyone, can work on it later on.

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 14 years ago
comment:2

Mathematica actually provides one called

  SystemInformation[]

http://reference.wolfram.com/mathematica/ref/SystemInformation.html

http://reference.wolfram.com/mathematica/guide/SystemInformation.html

It also provides information on the machine precision. A look at what they provide might be useful.

I would suggest we also include the value of exp(1.0), as that is system dependent, though care needs to be used in computing that so compilers don't inline the result.

Some possible sources of information would be:

== Solaris ==

The best way to find out if a Solaris system is running out of memory is to use the scan rate ('sr' column) of 'vmstat'. (Don't even think about believing top)

== Linux ==

3fb3767c-fe07-4534-9ed5-173c01a1087c commented 14 years ago
comment:4

I will take this and try to write it for linux and sunos. It should 1) Be capable of being called from within Sage after installation. 2) Be capable of being called from command line outside Sage assuming an installation failure. (Assuming python exists.) 3) Be capable of returning precision,etc information from within Sage. 4) Be scalable to a new OS.

Stay tuned.

3fb3767c-fe07-4534-9ed5-173c01a1087c commented 14 years ago

shell script to determine somewhat verbose environment and hardware info, Linux only currently

3fb3767c-fe07-4534-9ed5-173c01a1087c commented 14 years ago
comment:5

Attachment: hrdwr-info.sh.gz

Notes on hrdwr-info.sh as of July 15:

There are several things to consider: * Pruning info from /etc is not safe. It is not standardized, subject to user modifications, subject to lack of updates between distros (Ubuntu -> Mint etc). But, when the info exists it can be useful. It can also be badly formatted... with \r and \n explicit in the file. I used some 'sed' commands to get around this a bit.

Questions I have: * Can you test it, break it?

I'll post a link with sample output since it isn't formatting nicely in this forum wysiwyg editor.

3fb3767c-fe07-4534-9ed5-173c01a1087c commented 14 years ago

example outputs

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 14 years ago
comment:6

Attachment: output-examples.txt

I had a look at this and have a few comments.

drkirkby@hawk:~$ command -v gcc
/usr/local/gcc-4.4.4-multilib/bin/gcc
drkirkby@hawk:~$ command -v g++
/usr/local/gcc-4.4.4-multilib/bin/g++
drkirkby@hawk:~$ command -v gfortran
/usr/local/gcc-4.4.4-multilib/bin/gfortran

('command -v' is portable - the use of 'which' is not). This will allow us to see if the compilers are a mix of compilers from places like /usr/bin and /usr/local/bin.

Overall, this looks very useful. I was quite impressed with it.

jhpalmieri commented 13 years ago
comment:7

I used this on a bunch of machines earlier today to help produce the page http://wiki.sagemath.org/skynet. Very nice. I don't know how useful the username, pwd, and shell lines are, but they don't hurt. On two machines, both with processors described by arch as "ia64" (is that itanium?), there is no line "model name" in /proc/cpuinfo, so the corresponding line in the summary printed by the script ends up blank. Otherwise, it seems to work well on all of the linux machines I tried.

On Mac OS X, I think a lot of the relevant information can be extracted by running the command sysctl -a. See http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man8/sysctl.8.html and http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man3/sysctl.3.html.

3fb3767c-fe07-4534-9ed5-173c01a1087c commented 13 years ago
comment:8

jhpalmieri: Can you please send me (jason.b.hill@colorado.edu) the /proc/cpuinfo files for Cicero and Cleo? I have made some changes to the script via what David said above, but there are some things that I can improve if I know what those proc files look like, since there are a couple of cases when the script is catching errors and shouldn't be.

I'll respond more after making those changes and re-uploading the script.

jhpalmieri commented 13 years ago
comment:9

I've copied /proc/cpuinfo for cicero, cleo, and iras to the files

3fb3767c-fe07-4534-9ed5-173c01a1087c commented 13 years ago
comment:10

How many physical processors do Iras and Cleo have? (My guess is 2.) Have you noticed that their "physical id" tags are very strange? The physical cpu labels are 0 and 3 instead of 0 and 1. I'm wondering if this is consistent with other Itaniums.

jhpalmieri commented 13 years ago
comment:11

How do I find out how many physical processors they have? A comment on some random web page suggests that if two "processors" in /proc/cpuinfo have the same physical address, then they're the same processor. How else do I tell?

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:12

Replying to @jhpalmieri:

How do I find out how many physical processors they have? A comment on some random web page suggests that if two "processors" in /proc/cpuinfo have the same physical address, then they're the same processor. How else do I tell?

On Solaris or OpenSolaris:

kirkby@t2:[~] $  /usr/sbin/psrinfo -p
2

but I don't know about any other operating system.

3fb3767c-fe07-4534-9ed5-173c01a1087c commented 13 years ago
comment:13

_Usually_ the following happens: If the system has 1 cpu, then no "physical id" will be present in /proc/cpuinfo. Otherwise, physical_id gives the slot number of the cpu/core/thread in question. The problem with Itaniums is that the original "topology" listed the first slot as -1 on the board, instead of 0. So, it may be that there is still a bug in translating the topology.

Try looking in /sys/devices/system/cpu/cpu0/topology /sys/devices/system/cpu/cpu1/topology /sys/devices/system/cpu/cpu2/topology ...

and seeing how many different entries you can find. Those, if they exist, should correspond to board slots for cpus.

jhpalmieri commented 13 years ago
comment:14

Cleo has four different directories /sys/devices/system/cpu/cpuN/topology/, for N from 0 to 3. The file cpu0/topology/core_id is the same as that for cpu2 (and the ones for cpu1 and cpu3 are equal to each other), while the file cpu0/topology/physical_package_id is the same as that for cpu1, different from that for cpu2 and cpu3. Same situation for iras.

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:15

It looks like on Solaris,

kirkby@t2:[~] $ netstat -an | grep LISTEN | awk '{print $1}' | grep ^*.8000
kirkby@t2:[~] $ 

can be used to determine if there is something listening on port 8000, which would be useful to know for Sage.

Sorry that don't help with the physical processors. To be honest, its not one of the most useful parameters though. I can't think of many Sage issues where the physical number of CPUs is going to be that important.

memconf seems pretty clever on 't2,math'

# ./memconf
Gathering data for memconf. This may take over a minute. Please wait...
hostname: t2
Sun Microsystems, Inc. T5240 (2 X 8-Core 8-Thread UltraSPARC-T2+ 1167MHz)
Memory Interleave Factor: 8-way
socket MB/CMP0/BR0/CH0/D0 has a Samsung 501-7953-01 Rev 05 2GB FB-DIMM
socket MB/CMP0/BR0/CH0/D1 has a Samsung 501-7953-01 Rev 05 2GB FB-DIMM
socket MB/CMP0/BR0/CH1/D0 has a Samsung 501-7953-01 Rev 05 2GB FB-DIMM
socket MB/CMP0/BR0/CH1/D1 has a Samsung 501-7953-01 Rev 05 2GB FB-DIMM
socket MB/CMP0/BR1/CH0/D0 has a Samsung 501-7953-01 Rev 05 2GB FB-DIMM
socket MB/CMP0/BR1/CH0/D1 has a Samsung 501-7953-01 Rev 05 2GB FB-DIMM
socket MB/CMP0/BR1/CH1/D0 has a Samsung 501-7953-01 Rev 05 2GB FB-DIMM
socket MB/CMP0/BR1/CH1/D1 has a Samsung 501-7953-01 Rev 05 2GB FB-DIMM
socket MB/CMP1/BR0/CH0/D0 has a Samsung 501-7953-01 Rev 05 2GB FB-DIMM
socket MB/CMP1/BR0/CH0/D1 has a Samsung 501-7953-01 Rev 05 2GB FB-DIMM
socket MB/CMP1/BR0/CH1/D0 has a Samsung 501-7953-01 Rev 05 2GB FB-DIMM
socket MB/CMP1/BR0/CH1/D1 has a Samsung 501-7953-01 Rev 05 2GB FB-DIMM
socket MB/CMP1/BR1/CH0/D0 has a Samsung 501-7953-01 Rev 05 2GB FB-DIMM
socket MB/CMP1/BR1/CH0/D1 has a Samsung 501-7953-01 Rev 05 2GB FB-DIMM
socket MB/CMP1/BR1/CH1/D0 has a Samsung 501-7953-01 Rev 05 2GB FB-DIMM
socket MB/CMP1/BR1/CH1/D1 has a Samsung 501-7953-01 Rev 05 2GB FB-DIMM
empty sockets: MB/CMP0/MR0/BR0/CH0/D2 MB/CMP0/MR0/BR0/CH0/D3 MB/CMP0/MR0/BR0/CH1/D2 MB/CMP0/MR0/BR0/CH1/D3 MB/CMP0/MR0/BR1/CH0/D2 MB/CMP0/MR0/BR1/CH0/D3 MB/CMP0/MR0/BR1/CH1/D2 MB/CMP0/MR0/BR1/CH1/D3 MB/CMP1/MR1/BR0/CH0/D2 MB/CMP1/MR1/BR0/CH0/D3 MB/CMP1/MR1/BR0/CH1/D2 MB/CMP1/MR1/BR0/CH1/D3 MB/CMP1/MR1/BR1/CH0/D2 MB/CMP1/MR1/BR1/CH0/D3 MB/CMP1/MR1/BR1/CH1/D2 MB/CMP1/MR1/BR1/CH1/D3
total memory = 32768MB (32GB)

knowing how many CPUs there are, how many cores they have and how many threads there are.

Dave

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:16

One thing that would be useful is the free disk space on whatever file system someone is building Sage on. Not sure the best way to find that out. Any ideas?

Dave

3fb3767c-fe07-4534-9ed5-173c01a1087c commented 13 years ago

Attachment: hrdwr-info.2.sh.gz

3fb3767c-fe07-4534-9ed5-173c01a1087c commented 13 years ago
comment:17

I attached a new file. I made the 'useless cat' and other changes. (Thanks Dave!)

Some more notes:

It looks like the OS-X commands are somewhat straightforward, and the hardware is much more standardized there. Dave: We have a couple of lists of relevant Sunos commands... can we develop a single list of what is useful and what isn't? I have Open Solaris on a machine and so testing won't be an issue.

John:

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:18

Replying to @sagetrac-jasonbhill:

I attached a new file. I made the 'useless cat' and other changes. (Thanks Dave!)

You are welcome.

Some more notes:

  • cpu throttling: It is easy to determine if a cpu supports throttling, but can be a pain to determine (across various linux distributions) if it is enabled. Even from Ubuntu 9.10 to 10.04, this changed, and both are inconsistent with the Debian installations I've tried. So, I'm not incredibly optimistic here.

You do not surprise me. I came to the same conclusion on Solaris too.

  • Getting the sage banner isn't a problem, so long as I know my pwd and how that relates to the banner's location. I haven't yet assumed that this script is executed anywhere specific.
  • I changed the gcc version command to output the entire output. The extra lines now included are only copyright info though, and so I'd question if we'd want them.

We should be able to get all the parameters that gcc was configured with

drkirkby@hawk:~$ gcc -v
Using built-in specs.
Target: i386-pc-solaris2.11
Configured with: /export/home/drkirkby/gcc-4.4.4/configure --prefix=/usr/local/gcc-4.4.4-multilib --enable-languages=c,c++,fortran --with-gmp=/usr/local/gcc-4.4.4-multilib --with-mpfr=/usr/local/gcc-4.4.4-multilib --disable-nls --enable-checking=release --enable-werror=no --enable-multilib --with-system-zlib --enable-bootstrap --with-gnu-as --with-as=/usr/local/binutils-2.20/bin/as --without-gnu-ld --with-ld=/usr/ccs/bin/ld
Thread model: posix
gcc version 4.4.4 (GCC) 

that lot can be very useful to know.

  • I'll add the netstat command (and/or equivalent).

OK.

  • diskspace is relatively easy. I'll add that as well.

How do you propose to get the disk space? I came to the conclusion that was quite difficult, and have not found a way.

It looks like the OS-X commands are somewhat straightforward, and the hardware is much more standardized there. Dave: We have a couple of lists of relevant Sunos commands... can we develop a single list of what is useful and what isn't? I have Open Solaris on a machine and so testing won't be an issue.

Yes. It's late here. I will get onto that after some sleep.

Dave

John:

  • You want to update some of the skynet information you posted before. The cpu information for at least one of the machines changed (since I thought 'uniq' would apply 'sort' and it doesn't... meaning that the cores/cpu-count is off).
  • I also added the correct greps to attempt to get information from those Itaniums. My best guess as to why those topology numbers are so strange is that those machines may actually accept 4 processors, but only have 2 plugged in.

It might be worth giving memconf a try, to see what it finds. I'm not suggesting including that, but it might confirm your guesses.

If that's not the case, then it's buggy and there just aren't enough people using those cpus to make it worth fixing.

jhpalmieri commented 13 years ago
comment:19

Replying to @sagetrac-drkirkby:

Replying to @sagetrac-jasonbhill:

  • You want to update some of the skynet information you posted before.

Thanks, I've done that.

It might be worth giving memconf a try, to see what it finds.

memconf doesn't seem to be installed on the skynet machines, or at least I haven't found it.

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:20

Replying to @jhpalmieri:

Replying to @sagetrac-drkirkby:

Replying to @sagetrac-jasonbhill:

  • You want to update some of the skynet information you posted before.

Thanks, I've done that.

It might be worth giving memconf a try, to see what it finds.

memconf doesn't seem to be installed on the skynet machines, or at least I haven't found it.

On some systems it runs as a normal user process. It can be downloaded in a few seconds, change the permission to make it executable and you are ready to go. It does not need compiling- its just a perl script.

On 't2' it needs root access. I have no idea on other systems. I think it would work on mark without root access.