sakaki- / gentoo-on-rpi-64bit

Bootable 64-bit Gentoo image for the Raspberry Pi4B, 3B & 3B+, with Linux 5.4, OpenRC, Xfce4, VC4/V3D, camera and h/w codec support, weekly-autobuild binhost
GNU General Public License v3.0
921 stars 126 forks source link

Rebooting #53

Closed akjagadeesh closed 5 years ago

akjagadeesh commented 6 years ago

Whenever I shut down, I remove the power, then remove the micro sd card. I try to reinsert the micro-SD card again but I cannot reboot the system again. Am I doing something wrong?

akjagadeesh commented 6 years ago

Also I want to clarify that I click the shut down option in the demo user button on the top right corner of the screen first. I wait for it to shut down and the screen goes blank. Finally I remove the power, and then remove the SD card. Later I try to put the SD card back in, put the power again, but only yellow lights flicker on and off.

akjagadeesh commented 6 years ago

So basically nothing is happening, or the reboot process is taking way too long.

akjagadeesh commented 6 years ago

Also does the pi already have gdb located somewhere in the system?

sakaki- commented 6 years ago

@akjagadeesh - sorry to hear you are having problems with this. Is this a fresh copy of the image you are using, or have you installed any other packages or made other changes to the configuration? I have powered down & restarted (and also rebooted) many copies of the image without incident. You may have corruption on your microSD card (you could try running fsck on the two partitions on the card, if you have access to a linux PC).

akjagadeesh commented 6 years ago

It is a fresh image, I write the genpi64.img into the sd card using etcher. I then insert it into the pi, and it boots spectacularly as always. I then shut it down as usual and remove the power afterwards, and reinsert the power to check if it would reboot.

I will check if it is a micro sd card corruption problem. Thank you for replying so swiftly!

Also does your system have GDB or support GDB in any way. I'm having trouble downloading it using emerge. Very new to all this stuff.

akjagadeesh commented 6 years ago

File system is clean and I did a manual fsck. Is there anything else you can suggest?

sakaki- commented 6 years ago

If the system is not booting at all when you start it up the second time (i.e. you do not even get console text on the screen), then that suggests there is an issue with the kernel, the boot firmware, or the kernel module set. What brand and capacity is your microSD card incidentally?

One test you could do, is to mount the first partition on a linux pc, and run ls -l on the top-level directory. Then, check the contents of /lib/modules on the second partition, and paste the results here if you would (there won't be anything confidential on there at this stage, as you have started from a fresh image).

PS unfortunately gdb is not currently on the maintained autobuild list, so you'll have to emerge it locally (which will take time). I'll look at adding it to the autobuild list this weekend however (which will make the install a lot quicker, since emerge will pick up the resulting binary package).

akjagadeesh commented 6 years ago

I'm not near a linux pc right now, but I believe the microSD card is the one the raspberry pi people give you (it has the symbol on it). It has 16GB on it.

Also I was able to get gdb on the pi, but it took about 50 min for it to emerge.

sakaki- commented 6 years ago

BTW the other thing you can try, when you have the first partition mounted, is to edit the file config.txt in the top level directory there, and comment out the line dtoverlay=vc4-fkms-v3d,cma-256, then save this file and try booting again (this will avoid the using the vc4 graphics driver, so will make it more likely you see console error messages, if any are printed during early boot).

akjagadeesh commented 6 years ago

So I still have not done the linux tests yet. However I will say it is not a hardware problem because I used a high end 64 GB micro sd card and that did not work. So its definitely not a hardware problem, it is a software problem.

Do you think etcher might have caused a problem with the booting? I mean it boots great the first time around.

sakaki- commented 6 years ago

It is unlikely that etcher is the issue. The system does autoexpand the root partition first time, so something may be going on there, although I would have expected others to have complained by now if that was the case. I'll have some time tomorrow to try a few tests on my RPi3 systems to see if I can reproduce this.

Are you using an RPi3 B or B+? Also, what network connection are you using (WiFi or Ethernet; presumably you have some as you emerged gdb - this is important as it impacts on setting the onboard clock).

akjagadeesh commented 6 years ago

I am using a RPi3 B+, I am using WiFi as well, in Greece if that makes any difference.

sakaki- commented 6 years ago

Hi, so I have just tried a fresh copy of the image on an RPi3B+, same steps as you describe. Boots fine again for me after the first boot / shutdown (from menu) / power off / microSD card removal / reinsertion / power on sequence.

If you get a chance, do try commenting out the dtoverlay=vc4-fkms-v3d,cma-256 entry as suggested above, as this will give you earlier indication of any kernel-related issues.

Are you using an official power supply?

akjagadeesh commented 6 years ago

Yes I am using the official power supply. And I apologize but I have not done the tests yet. I can tell you the results of those when I talk to my friend, an expert on linux machines, on monday.

akjagadeesh commented 6 years ago

Is this what you meant?

pgratz@pgratz-XPS-13-9350:~/tmp1$ ls -l total 28536
-rwxr-xr-x 1 root root 24334 Jun 5 17:01 bcm2710-rpi-3-b.dtb
-rwxr-xr-x 1 root root 24597 Jun 5 17:01 bcm2710-rpi-3-b-plus.dtb
-rwxr-xr-x 1 root root 17341 Jun 5 17:01 bcm2837-rpi-3-b.dtb
-rwxr-xr-x 1 root root 52064 May 15 19:26 bootcode.bin
-rwxr-xr-x 1 root root 106 Jun 4 14:54 cmdline.txt
-rwxr-xr-x 1 root root 151363 Jun 5 17:01 config
-rwxr-xr-x 1 root root 2482 Jun 4 14:54 config.txt
-rwxr-xr-x 1 root root 18693 Jun 5 17:01 COPYING.linux
-rwxr-xr-x 1 root root 2595 May 15 19:26 fixup_cd.dat
-rwxr-xr-x 1 root root 6569 May 15 19:26 fixup.dat
-rwxr-xr-x 1 root root 9722 May 15 19:26 fixup_db.dat
-rwxr-xr-x 1 root root 9726 May 15 19:26 fixup_x.dat
-rwxr-xr-x 1 root root 13713920 Jun 5 17:01 kernel8.img
-rwxr-xr-x 1 root root 1494 May 15 19:26 LICENCE.broadcom
drwxr-xr-x 2 root root 16384 Jun 4 22:05 overlays
-rwxr-xr-x 1 root root 672804 May 15 19:26 start_cd.elf
-rwxr-xr-x 1 root root 4967492 May 15 19:26 start_db.elf
-rwxr-xr-x 1 root root 2824484 May 15 19:26 start.elf
-rwxr-xr-x 1 root root 3911492 May 15 19:26 start_x.elf
-rwxr-xr-x 1 root root 2690440 Jun 5 17:01 System.map
second partition: pgratz@pgratz-XPS-13-9350:~/tmp2/lib/modules$ ll -l total 4 drwxr-xr-x 3 root root 4096 Jun 5 17:09 4.14.44-v8-4fca48b7612d-bis+ pgratz@pgratz-XPS-13-9350:~/tmp2/lib/modules$ cd 4.14.44-v8-4fca48b7612d-bis+/ pgratz@pgratz-XPS-13-9350:~/tmp2/lib/modules/4.14.44-v8-4fca48b7612d-bis+$ ls -lart total 1848 -rw-r--r-- 1 root root 330 Jun 5 17:01 modules.devname -rw-r--r-- 1 root root 252359 Jun 5 17:01 modules.symbols.bin -rw-r--r-- 1 root root 194497 Jun 5 17:01 modules.dep.bin -rw-r--r-- 1 root root 11111 Jun 5 17:01 modules.builtin -rw-r--r-- 1 root root 486298 Jun 5 17:01 modules.alias -rw-r--r-- 1 root root 12220 Jun 5 17:01 modules.builtin.bin -rw-r--r-- 1 root root 352 Jun 5 17:01 modules.softdep -rw-r--r-- 1 root root 205220 Jun 5 17:01 modules.symbols -rw-r--r-- 1 root root 133427 Jun 5 17:01 modules.dep -rw-r--r-- 1 root root 501122 Jun 5 17:01 modules.alias.bin -rw-r--r-- 1 root root 53283 Jun 5 17:01 modules.order -rw-r--r-- 1 root root 40 Jun 5 17:02 owning_binpkg drwxr-xr-x 11 root root 4096 Jun 5 17:09 kernel drwxr-xr-x 3 root root 4096 Jun 5 17:09 . drwxr-xr-x 3 root root 4096 Jun 5 17:10 ..

akjagadeesh commented 6 years ago

Hmm it all got crossed over for some reason but here you go I guess

sakaki- commented 6 years ago

Hi, those all look the same. So unless one of the boot or module files is corrupted (unlikely), then the system should at least mount boot the kernel and mount root. If there was a problem after that (in the userland init code for example), you should at least get an error message on the screen.

I assume you haven't changed the disk UUID or anything like that? Have you got a physical screen attached, or are you connecting via ssh? Have you tried commenting out the dtoverlay=vc4-fkms-v3d,cma-256 line in /boot/config.txt (this will give you earlier console output)?

To double check your bootcode contents (since all the file lengths look fine), try running md5sum * on a linux box, inside the top-level directory of the first partition of the sdcard (mounted as above). This should yield something like (order may be different ):

d7810fab7487fb0aad327b76f1be7cd7  COPYING.linux
4a4d169737c0786fb9482bb6d30401d1  LICENCE.broadcom
242d22f4e4a9b4d74058cd1426e0ac03  System.map
b1b4ea71d5a46f378a905bae0363741f  bcm2710-rpi-3-b-plus.dtb
25adec360c5dcd0451067cbcafc829fc  bcm2710-rpi-3-b.dtb
6d62493f2bf35c09f7d8110c6e6d673b  bcm2837-rpi-3-b.dtb
17efaf1c1ef89289168d71cdc8194982  bootcode.bin
db0f5c229359d581ff5d01e57111be44  cmdline.txt
393bfcd4e5b7ec9b5748746e8a726b26  config
51b21ed604fba8bb5c7ef0b070c884e8  config.txt
db8e1a2b72cfa49188d7898aef94cff8  fixup.dat
ec063c178b6edad81daf227557ccafb7  fixup_cd.dat
aacf874b9ad169bf15e098770863ed1a  fixup_db.dat
42fd79d196c35b717064c337a864a06a  fixup_x.dat
be7bbc9edb8417094199f7364216de8e  kernel8.img
md5sum: overlays: Is a directory
b67c7c5d335b81b96901c3991d13526a  start.elf
166d27bad20dcc5748c03eac29a3cf03  start_cd.elf
b630f3a41d896096716580f0aa3738a8  start_db.elf
6514271d8c75eb3baeb73b63b99165a7  start_x.elf

Your config.txt file's checksum will not agree if you have edited it.

akjagadeesh commented 6 years ago

So I tried commenting out that line and something may have happened on the screen but it was so fast that I could not see anything. Anyway I will do the md5sum tomorrow.

akjagadeesh commented 6 years ago

And I do have a physical screen connected

tomdoughty62 commented 6 years ago

Just stumbled upon this issue and can sadly say that I've had the same issue. Setup my Pi 3 on...Sunday, switched it off this morning and then today it didn't seem to boot. Commented out dtoverlay=vc4-fkms-v3d,cma-256 in config.txt and also enabled hdmi hotplug as well just for good measures.

Awesome work Sakaki.

Edit: This worked for me, forgot to mention initially.

sakaki- commented 6 years ago

@tomdoughty62 - sorry to hear you're experiencing what looks like the same issue. I'd really like to get to the bottom of this, as I can't reproduce it here, despite repeated attempts. On your system (assuming you have access to a linux PC to mount the microSD card):

tomdoughty62 commented 6 years ago

Seems fine to me:

25adec360c5dcd0451067cbcafc829fc bcm2710-rpi-3-b.dtb b1b4ea71d5a46f378a905bae0363741f bcm2710-rpi-3-b-plus.dtb 6d62493f2bf35c09f7d8110c6e6d673b bcm2837-rpi-3-b.dtb 17efaf1c1ef89289168d71cdc8194982 bootcode.bin db0f5c229359d581ff5d01e57111be44 cmdline.txt 393bfcd4e5b7ec9b5748746e8a726b26 config 6093f658c28f313a74c04b2160e0a585 config.txt d7810fab7487fb0aad327b76f1be7cd7 COPYING.linux ec063c178b6edad81daf227557ccafb7 fixup_cd.dat db8e1a2b72cfa49188d7898aef94cff8 fixup.dat aacf874b9ad169bf15e098770863ed1a fixup_db.dat 42fd79d196c35b717064c337a864a06a fixup_x.dat be7bbc9edb8417094199f7364216de8e kernel8.img 4a4d169737c0786fb9482bb6d30401d1 LICENCE.broadcom md5sum: overlays: Is a directory 166d27bad20dcc5748c03eac29a3cf03 start_cd.elf b630f3a41d896096716580f0aa3738a8 start_db.elf b67c7c5d335b81b96901c3991d13526a start.elf 6514271d8c75eb3baeb73b63b99165a7 start_x.elf 242d22f4e4a9b4d74058cd1426e0ac03 System.map

When I had the issue my TV simply didn't display/detect anything. My Pi flashed orange for a moment and had a solid red light (which I think is normal?) Sadly I didn't manage to grab any logs. I just tried commenting out dtoverlay=vc4-fkms-v3d,cma-256 and enabling HDMI hotplug.

Sorry I can't be more help.

tomdoughty62 commented 6 years ago

Oh something I left out of my post yesterday is that my Pi is working again. Haha, editing the config.txt file fixed it. Sorry I'm still a noob Linux user.

akjagadeesh commented 6 years ago

I hope that solves the problem. Sorry I haven't been responding. My deadline for the project has finished and I no longer have access to a monitor to do this. I might be able to do something next week.

sakaki- commented 6 years ago

@tomdoughty62 - glad you got it working. If you get a chance, could you try uncommenting dtoverlay=vc4-fkms-v3d,cma-256 again, but leaving HDMI hotplug enabled, to see if that variant boots?

Also, once booted (in any variant) could you please run lsmod in a terminal and post the results?

Many thanks, S.

tomdoughty62 commented 6 years ago

Hey @sakaki- sorry for not responding. I have been meaning to do this for you, but after work I sit down and switch off. I tried uncommenting dtoverlay and the Pi still boots ok, but after coming home and switching it on I now have a black login screen with the following: This is (none) (Linux aarch64....) (none) login: Sadly after this I can't seem to login, possibly because I've forgotten my password. This happens with dtoverlay commented as well.

thetooth commented 6 years ago

Can confirm this issue. RPI3B+ with legit power supply(not getting lightning bolt ever, no flashing red).

In my case I have a provisioning script that does only a few things, it creates a new user and deletes the demouser, then removes roots password and sets one for the new account, nothing special here. Then I emerge i3. Next it adds a config file to lightdm to auto login the new user with an i3 session. An i3 config is also copied that executes a single static binary and this completes the list of modifications.

Here's what happens: On first boot everything works 100%, I can emerge, do a full system upgrade, etc. On second boot one of two things can happen, it either shows the rainbow block then goes blank(still a video signal) or after mounting and commenting out the dtoverlay line, I can get it to show text 1 out of 5 power cycles, eventually the SD card becomes heavily corrupted resulting in NO VIDEO, no rainbow block, no nothing. While booting without the vc4 module the init system is locking up waiting for a network because Network Manager will not start(I see dhclient running but ONLY probing the WiFi interface).

If I run fsck on my host against the root partition I am seeing many issues:

e2fsck 1.42.13 (17-May-2015)
root contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inode 4304, end of extent exceeds allowed value
    (logical block 317, physical block 3115170, len 52)
Clear<y>? yes
Inode 4304, i_blocks is 2936, should be 2544.  Fix<y>? yes
Inodes that were part of a corrupted orphan linked list found.  Fix<y>? yes
-----------
about 1,500 lines claiming that it was fixed.
-----------
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -36906 -(336897--336899) -(348160--348168) -(3115170--3115221)
Fix<y>? yes
Free blocks count wrong for group #1 (262, counted=261).
Fix<y>? yes
Free blocks count wrong for group #10 (18816, counted=18823).
Fix<y>? yes
Free blocks count wrong for group #95 (28510, counted=28562).
Fix<y>? yes
Free blocks count wrong (1227147, counted=1227205).
Fix<y>? yes

root: ***** FILE SYSTEM WAS MODIFIED *****
root: 591171/3874640 files (0.2% non-contiguous), 2645915/3873120 blocks

Please keep in mind this happens regardless of if I do ANYTHING with the system but let it idle, it's just as OP said you only need to reboot it once for the failure to manifest. Just to be sure the correct way to reboot gentoo is sudo reboot right??? :P

sakaki- commented 6 years ago

Hi thetooth,

sorry to hear you're having problems with this. Yes, sudo reboot should be fine (although you could prefix with an extra sudo sync to make sure).

There shouldn't be anything on the system itself that is causing root filesystem corruption (which you are seeing) unless perhaps there is an issue with a kernel storage driver, you ran out of space or inodes during an emerge, or powered down during a build, or maybe (unlikely but possible) issues with the SD card itself.

If you have an RPi3 B+ (or don't mind setting the OTP fuse on an RPi3 B - the 3B+ has this set by default) could you try writing the system to a USB stick (rather than microSD card) and booting it from that, then running your mod script etc. to see if the same issues persist?

If you are willing to share your setup script I could also try running it on one of my boxes to see if I can reproduce the issue here.

thetooth commented 6 years ago

Sorry for the late reply, other things came up but I managed to resolve this issue as being a hardware problem. I did not try booting and running on a USB, however I had consistent issue with SanDisk Ultra 16GB cards with serial numbers 8097DRJT40{XX}. After switching to a much older card of the same capacity and technology the problem has not resurfaced.

I cannot find ANY information on cards with these serial numbers so they may well have been counterfeit. The main indicator that supports this is the silk screen is very grainy compared to the older cards I have.

sakaki- commented 5 years ago

Old issue, closing now in pre 1.5.0 housecleaning Hopefully if there is an underlying issue here it will be fixed in the (forthcoming) 1.5.0 release. Please feel free to re-open if the issue resurfaces. Best, sakaki